Feb 18th, 2015

The problems with n0001 (or is it the switch ?) continue. The major symptom was that once a job was started, the node hang. The node was subjected to memory and CPU testing (stand-alone) which showed no problems. Then the switch port was exchanged between n0001 and n0008. During the first test, the node hang again. Then (without changing anything else), it behaved and the job run without problems. At the next power failure I'll try to cold-start everything in the cluster room.

2015/02/18 13:45

Feb 6th, 2015

Continuous power failures continue. To top-it up, significant icing on A/C unit. Wait for northerly winds on Saturday … :-/

2015/02/06 16:21

<< Newer entries | Older entries >>

The full maintenance archive is kept here

…and finally, The infamous MBG's Power Failure Log

about/maintenance.txt · Last modified: 2011/01/31 17:56 (external edit)