about:maintenance [Norma]

Jul 3rd, 2011

Bloody UPS problems again (?). Nodes n0003 & n0004 died and then successfully restarted o/n. No sign in the logs of a power failure of sufficient length to be recorded. The other two nodes connected to the same UPSs stayed up-and-running. The load on all UPSs is the same (at ~80%). Current working hypothesis is that even very short power disturbances are sufficient for killing the two GPU-loaded nodes. But why shouldn't this also be the case for n0001 & n0002 ? :-/

Ignore the above (???). For nodes n0003, n0004 & n0006 chassis fans not functional. Could this be it ? Will have to wait …

UPS1	head node + switches + DAT tape

UPS2	n0001 + n0005

UPS3	n0002 + n0006

UPS4	n0003 + n0007

UPS5	n0004 + n0008 (sitting behind the cluster)

UPS6	n0009 (sitting to the left of i7)

Jul 3rd, 2011

Jun 27th, 2011