A/C unit installed. Initially recorded temperatures quite good (that is, low). Later in the evening (after equilibration) not brilliant. Run a temperature monitoring script to see how it goes. Needless to say that nefeli's A/C was immediately switched-off with marked results:
Hopefully, even if the A/C in norma fails, the crontab should save the day:
crontab script for emergency cluster shutdown
#!/usr/bin/perl -w
system("/usr/bin/pdsh -w norma,n0001,n0002,n0003,n0004,n0005,n0006,n0007,n0008 sensors | grep 'Core ' | awk '{print \$4}' > /tmp/temp");
open FILE, "/tmp/temp" or die $!;
$sum = 0.0;
$count = 0;
while ( <FILE> )
{
$sum += $_;
$count++;
}
system("/bin/rm /tmp/temp");
if ( ($sum/$count) > 54 )
{
$temp = int $sum/$count;
system("logger \[temp\] Alert, current average core temperature is $temp deg C");
system("logger \[temp\] Temperature alert, taking everything down now ...");
system("ssh 10.0.0.11 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0001");
system("ssh 10.0.0.12 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0002");
system("ssh 10.0.0.13 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0003");
system("ssh 10.0.0.14 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0004");
system("ssh 10.0.0.15 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0005");
system("ssh 10.0.0.16 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0006");
system("ssh 10.0.0.17 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0007");
system("ssh 10.0.0.18 \"/sbin/shutdown -h now\"");
system("logger \[temp\] Issued shutdown to n0008");
system("logger \[temp\] All nodes taken down. Now the server ...");
system("/sbin/shutdown -h now \&");
}
# else
# {
# $temp = int $sum/$count;
# system("logger \[temp\] Current average core temperature is $temp degrees C");
# }
exit;
Discussion