Rules of engagement


Thou shall not

  1. Run jobs or do any serious calculation on the head node. That's why the compute-only nodes exist.
  2. Never log-on directly to the compute nodes (i.e. you shant do anything like ssh n000?). Use the queuing system. If the queuing system doesn't work, it is a good opportunity to report it and get it fixed.
  3. Do not assume that backing-up your data is someone else's responsibility. Because it isn't.
  4. Do not leave your large files sitting on the cluster's disks unnecessarily. The RAID may fail, or root may have had a very bad day (use ftp and/or sftp before he has an opportunity to delete your files).
  5. Try to avoid installing software in your home directory. Ask root to make your useful programs useful for all users.
  6. Do not try to over-take other people waiting in the queue by submitting numerous smaller jobs to the fast queue. You will not go unnoticed …


Thou shall

  1. Document standard operating procedures (or even your personal operating procedures) via the wiki. Share your protocols and findings with others: it is not just good science, it is your opportunity to debug them and discover possible problems/ mistakes/ omissions/ nonsenses.
  2. Do report problems, especially about missing shared libraries. Use wiki's Contact, or send a mail to root (mail root will do the trick).
  3. If at all possible, do prepare all files needed for your job on your machine. When everything looks OK, ftp them across and start your job. You wouldn't want to see root's icy eyes up-close-and-personal because a parallel 32-core job can't start just because you have a series of 5-minute serial jobs stacked-up in the queue. There is an important exception to this rule: you should tune and calibrate your calculations for Norma. The quantitative way to do this is to calculate the job's parallel efficiency as described here. For large and expensive clusters the rule of thumb is to never go below 70% efficiency, but we (the utmost periphery universities) can bend that a bit ;-)
  4. Do organize your calculations from beforehand. It is bad practice to keep-on submitting one-by-one a 4- or 8-core job every week. Collect the calculations you intend to perform and simultaneously submit them as, for example, 8 jobs each using 4 cores.
  5. Do keep your files and directories well organised and tidy. Do use mkdir, but without overdoing it. Choose small and informative directory names. Keep the number of directories in your top (home) directory to less than ten entries.


Enjoy,
/

research/howto/rules_of_engagement.txt · Last modified: 2009/03/20 19:16 (external edit)