At last a solution to the problem with DCD files appearing corrupted and lagging behind the simulation: the (now obvious) answer is that sync-ing the head node is not enough. The compute nodes (with 4Gb of memory) are caching results for a long-long time before flushing them out (to the head node's disks). Adding a crontab entry that issues a /bin/sync
to the nodes every five minutes is the current working solution (which doesn't seem to affect performance).