Calculation of chemical shifts from a molecular dynamics trajectory

This will be based on the sparta+ program. The ad-hockery (calc_shifts) shown below is an attempt to automate the procedure, but does not cure the major problem which is this: too much I/O. For each frame, the dcd must be read, a pdb file must be written, the pdb file read by sparta+, and then everything should be processed and deleted. Which means that this is slow. This is very slow. This is so slow that it takes about 24 hours to do one million frames (and this is for only 39 shifts). A sensible way to do the trick is, of course, to not use each and every frame. Do one run using every, say, 100th frame with -first 1, then another run with -first 50 and compare the means and standard deviations. If the calculation has converged, you are OK. Else, repeat with -first 25 and -first 75 etc.


Edit : this can be parallelized. Using eight threads a parallel version can process 16,000 frames in 12 minutes (160,000 frames in approximately two hours assuming that it scales linearly, which is not true). The parallel version is named calc_shits_parallel on norma.


This script must be used interactively


#!/usr/bin/perl -w
 
(@ARGV == 2) or die "Usage: calc_shifts <dcd> <psf>\n";
 
#
# How many shifts we will be collecting ?
#
 
(`carma -atmid ALLID -pdb -first 1 -last 1 $ARGV[0] $ARGV[1]` eq "" ) or die "Carma made a boo-boo. Too bad ...\n";
`sparta+ -in $ARGV[0].*.pdb >& /dev/null`;
`/bin/rm -rf $ARGV[0].*.pdb $ARGV[1].*.pdb`;
 
open ( IN, "pred.tab" ) or die "Can not open pred.tab. Usage: calc_shifts <dcd> <psf>\n";
while ( $line = <IN> )
  {
    if ( $line =~ /^FORMAT/ )
      {
        last;
      }
  }
 
$line = <IN>;
$tot = 0;
while ( $line = <IN> )
  {
    $ids[ $tot ] = substr( $line, 0, 14 );
    $tot++;
  }
 
close( IN );
 
`/bin/rm -rf *.tab`;
 
if ( $tot < 1 )
  {
    print "Too few atoms for calculating shifts. Something is wrong. Bye.\n";
    exit;
  }
 
 
print "Will be collecting data for $tot atoms. Starting ...\n";
 
#
# Will do it in sets of 400 structures ...
#
 
$first = 1;
 
print "Now processing set starting at frame          ";
 
while( 1 )
{
 
printf("HHHHHHHH%8d", $first );     # These H's are backspaces ...
 
$last = $first + 399;
 
`carma -atmid ALLID -pdb -first $first -last $last $ARGV[0] $ARGV[1]`;
 
`sparta+ -in $ARGV[0].*.pdb >& /dev/null`;
 
`/bin/rm -rf $ARGV[0].*.pdb $ARGV[1].*.pdb *_struct.tab`;
 
 
@files = glob("$ARGV[0]*.tab");
 
if ( @files == 0 )
  {
    last;
  }
 
foreach $file ( @files )
{
 
`tail -$tot $file | awk '{printf "%8.3f ", \$5}' >> SHIFTS`;
`echo >> SHIFTS`;
}
 
`/bin/rm -rf $ARGV[0]*.tab`;
 
$first += 400;
 
}
 
print "\n\n";
 
#
# Calculate means + sigmas
#
open ( IN, "SHIFTS" ) or die "Can not open SHIFTS ??? How did this happen ???\n";
 
for ( $i=0 ; $i < $tot ; $i++ )
{
	$mean= 0.0;
	$nof_lines = 0;
	$std = 0.0;
	while ( $line = <IN> )
	  {
	    @data = split( ' ', $line );
 
	    $nof_lines++;
	    $delta = $data[ $i ] - $mean;
	    $mean += $delta / $nof_lines;
	    $std += $delta * ($data[ $i ] - $mean);
	  }
 
        printf "%s    %8.4f %8.4f\n", $ids[ $i ], $mean, sqrt( $std / ($nof_lines -1));
        seek( IN, 0, 0 );        
}
 
close( IN );
 
print "\nAll done.\n\n";