Working Efficiently on Abacus4

llq

The status of the queue can be inspected using llq.

To see why a job is waiting and does not start, use the option -s.

Loadleveler Classes

Information about the Loadleveler classes which are available can be seen using llclass. In particular, the maximum CPU time which a job in the class can consume is shown.

If, because of maintenance, no more jobs should be started on a specific node, the class long will be drained first, then medium, and finally short. For this reason, there are occasionally more resource available in the classes short and medium that there are in long.

In summary, the actual requirements for CPU time should be estimated as exactly as possible, so that a job which would run for a rather short time does not have to compete for resources with longer running jobs.

man

On Abacus4, there are several commands, like ls oder df, which are available in multiple versions - from AIX as well as from a different source, e.g. GNU. If the environment variable $MANPATH is set, then man returns only the non AIX pages. The AIX pages can be obtained either with man C or man -M/.

Examples
man df Manpage for GNU df
man C df Manpage for AIX df
man -M/ df Manpage for AIX df

MPI-Jobs

Parallel jobs on Abacus4 should in general be submitted so that a small number of tasks is run on each of several nodes, rather than a large number of tasks on a single node. Sufficient ConsumableResources are more likely to be available for such jobs and the memory utilisation for the cluster as a whole will be improved.

Thus, to run a job requiring 8 tasks, following would be sensible choices:
   # @ node = 2
   # @ tasks_per_node = 4
or
   # @ node = 4
   # @ tasks_per_node = 2

Because the InfiniBand connection between the nodes is both very fast and has a very low latency, the impact on performance of using multiple nodes is minimised.

SSH

Access to the HPC systems is only available via secure methods such as SSH and SCP.

Please note that the following will only work within the FU network. If you are outside the network, you need to set up a VPN connection (in German).

To connect to one of the HPC systems, the following command is used:
   $  ssh <username>@<system name>.zedat.fu-berlin.de
For example
   $  ssh smith@soroban.zedat.fu-berlin.de

You will then be asked for your ZEDAT password.

If you want to start a program on the remote system which opens a window on your Linux computer under X, use the option -X. This also works on Mac OS X, but from Version 10.8 on the package XQuartz must be installed first.

SSHFS

With the Linux command sshfs you can setup a local directory that refers to a directory on a remote machine.

First you create a local directory on your own Linux machine, e.g.
   $ mkdir my_remote_dir
You then enter the following
   $ sshfs <username>@<system name>.zedat.fu-berlin.de: my_remote_dir
To close the connection you use
   $ fusermount -u  my_remote_dir