n0003 & n0004 keep on dying unexpectedly. Is it hardware, is it the UPSs, the new GPUs, their power supplies ? For these two nodes “to be on the safe side replaced their power supplies with two 550W units”. Could this be it ?