n0003 & n0004 keep on dying unexpectedly. Is it hardware, is it the UPSs, the new GPUs, their power supplies ? For these two nodes “to be on the safe side replaced their power supplies with two 550W units”. Could this be it ?
maintenance/jul_20th_2011.txt · Last modified: 2011/07/20 11:51 (external edit)