It seems that this temperature-monitoring script does work:
May 21 20:02:11 norma logger: [temp] Alert, current average core temperature is 59 deg C May 21 20:02:11 norma logger: [temp] Temperature alert, taking everything down now ... May 21 20:02:11 norma logger: [temp] Issued shutdown to n0001 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0002 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0003 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0004 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0005 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0006 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0007 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0008 May 21 20:02:11 norma logger: [temp] Issued shutdown to n0009 May 21 20:02:11 norma logger: [temp] All nodes taken down. Now the server ... May 21 20:02:11 norma shutdown[19421]: shutting down for system halt May 21 20:02:14 norma init: Switching to runlevel: 0 May 21 20:02:17 norma wulfd[3479]: Terminating with exit code 254 May 21 20:02:24 norma xinetd[2834]: Exiting... May 21 20:03:18 norma kernel: Kernel logging (proc) stopped. May 21 20:03:18 norma kernel: Kernel log daemon terminating. May 21 20:03:19 norma exiting on signal 15
and then …
May 21 23:07:58 norma syslogd 1.4.1: restart. May 21 23:07:58 norma kernel: klogd 1.4.1, log source = /proc/kmsg started. May 21 23:07:58 norma kernel: Inspecting /boot/System.map-2.6.26.5-2.nsa1 May 21 23:07:59 norma kernel: Loaded 25661 symbols from /boot/System.map-2.6.26.5-2.nsa1. May 21 23:07:59 norma kernel: Symbols match kernel version 2.6.26. ..... May 21 23:13:21 norma logger: [WAKE-ME-UP] First reading is 32 deg C. Going to sleep now ... ..... May 21 23:23:22 norma logger: [WAKE-ME-UP] Second reading is 31 deg C. May 21 23:23:22 norma logger: [WAKE-ME-UP] Not cold enough ? Better do nothing. Bye. .....
Sounds like a broken A/C unit, but it is too late on Saturday to check. Given that the temperature is going up, take (remotely) head node down as well … (and in case I don't see you again, good-morning, good-evening and good-night).
Sunday's update: it was a false alarm in that the A/C didn't fail. The temperatures on the second attempt were high because the nodes failed to shutdown properly. Modified shutdown script and took it up again.