May 4th, 2010

Pilot-induced oscillation (cluster crash): /etc/init.d/network stop typed in the wrong terminal (connected to norma instead of the box I was setting-up), and then forgotten. Few minutes later slurm officially declared the nodes dead, and killed all jobs. It was only fair that I had to restart all jobs late in the evening :-?

