research:howto:approaching_norma [Norma]

Approaching Norma

The sooner you start viewing norma as a piece of laboratory equipment for which you will have to obtain proper training, the better. Previous experience with desktop computing machines does not carry-over to the cluster. You will have to shift mental gears in order to view the cluster for what really is: a multiuser, production-oriented, number-crunching computing machine, on which you have no physical access. Your aim should be to use the cluster, not to play with the cluster (that is, play with your research problems, not with the computer). Ideally, your interaction with the cluster should be practically non-existent: transfer your job's files to the norma, login-in for as long as it takes to submit your job to the queue, logout, watch your job's progress via norma's status web page, when your job is finished transfer the results to your machine, end of story. What follows on this page is a very quick tour aiming to convince you that there is very little you can do on norma, other than calculating with norma. So, here it goes:

Assuming that you have an account and a valid IP address, log-on to Norma using ssh and your password (you may need to use the fully qualified name for norma, norma.mbg.duth.gr) :

Do note the Report problems … clause, and do report problems (not immediately, check and then check again that it is indeed a machine-induced problem). There are two ways to report problems: If you feel comfortable with unix mail, go ahead and type Mail root from the shell. If you'd rather avoid unix mail, use your browser to submit your report via the contact page.

Now it is time to break the ice between you and Norma, so tell her 'top' and press the '1' key. You should see how relaxed the head node is (which is as it should: jobs run on the compute-only nodes, not the head-node):

Note that what top will show you is just the processes owned by you (and not all processes currently present on the system).

The next task is to determine how many nodes are up, and whether they are busy. Possibly the easiest way is to look at the footer of the various of Norma's wiki pages. The presence and color of the eight boxes shown at the footer of these pages correspond to what the cluster is doing. Another way is to use the command 'wwtop' (which stands for 'warewulf top') from the command line. On a busy day with all nodes up and running, wwtop will show something like this:

Possibly the most important difference between using a cluster and using a desktop machine, is the queueing system: if you want to get some work done on the cluster, you will have to join a queue. The idea is that you do not run your job, but you submit your job to something called a queue manager. If there are enough resources (that is, CPUs and physical memory) available on the cluster, then the queue manager will arrange for your job to start executing immediately. If the resources currently available do not suffice (because other people are using the cluster at the time) then your job will just have to sit there and wait until the needed (for your job) resources become available (ie. until the other peoples' jobs had finished executing). The queue manager responsible for Norma is called slurm and you'd better start getting used to him:

Slurm

For a well-informed view of what is happening on the cluster you will have to use the slurm-provided tools. Slurm comprises quite a number of programs and tools (see the quickstart users' documentation), but keeping it simple you can just use sinfo and squeue to see what (if anything) is running on the cluster. For an idling cluster:

On a busy day you will see something like this:

For a more graphical (so to speak) presentation, type smap. You will get something similar with this:

If you are a GUI-type of person, give sview from a terminal: This will open a graphics window showing the current state of the cluster, both with respect to available queues (partitions) and running jobs. With an idling cluster you should see something like:

If there are jobs running expect something similar to these:

If you do not have a computing job to submit to the cluster, then what you've done so far is almost everything that you can do. It is time to logout and start preparing your files for your job, but before that:

A productive, trouble-free session with Norma

Read the page about the rules of engagement.
Prepare the files for your job and test them on your machine.
Use sftp or ftp to transfer them to norma.
Read the howto's about how to submit your job(s).
Submit your job, make sure it appears in the queue (type squeue). When the required resources (number of nodes) become available, your job will be transfered and start executing.
Keep an eye on it while running.
When the job finishes use an ftp client that can transfer recursively whole directories to transfer your results to your machine.
Verify, back-up and verify again your data (working on your machine).
Go back to norma and clean-up your home directory.
You are done.