The assumption is that you have a set of experimental NOE-derived distances which you want to compare with the results from your simulation. For the discussion that follows you will need carma plus two extra programs: a perl script 'prep_proton.pl' and a C program 'noe_averaging'. Both of these are installed on norma, but for completeness, here is their source code.
prep_proton.pl, a perl script to calculate exhaustive table with all proton pairs
#!/usr/bin/perl -w
#
# This script will prepare a file containing a list of pairs of hydrogen
# atoms to be used with 'carma -distances'. The aim is to then use the
# distances to calculate NOEs (using the program noe_averaging.c). The list
# prepared by this script is exhaustive. I can not imagine using it for
# systems larger than oligopeptides. Use a PSF for just the peptide.
#
# PLAN: Read a PSF file. Prepare a list with all atoms whose atom name starts
# with an 'H' (ignore residues with a residue name starting with 'TIP',
# just in case). Then use a double-for to write the output file. Because
# carma only uses the first two numbers, add extra columns containing
# information about the corresponding residues, residue numbers and
# atom names.
#
if ( @ARGV == 1 )
{
if ( $ARGV[0] =~ /(\w+)\.(p|P)(s|S)(f|F)/ )
{
$outname = $1 . ".protons";
open( IN , $ARGV[0] ) or die "Can not open input file\n";
open( OUT, ">$outname" ) or die "Can not open output file\n";
}
else
{
print "Usage: prep_proton in.psf [protons.list]\n";
exit;
}
}
elsif ( @ARGV == 2 )
{
open( IN , $ARGV[0] ) or die "Can not open input file\n";
open( OUT, ">$ARGV[1]" ) or die "Can not open output file\n";
}
else
{
print "Usage: prep_proton in.psf [protons.list]\n";
exit;
}
while ( $line = <IN> )
{
if ( $line =~ /\s*(\d*)\s*\!NATOM/ )
{
$nof_atoms = $1;
last;
}
}
$tot_protons = 0;
for ( $i=0 ; $i < $nof_atoms ; $i++ )
{
$line = <IN>;
if ( $line =~ /^\s+(\d+)\s+\w*\s+(\d+)\s+(\w+)\s+(\w+)/ )
{
$atnumber = $1;
$resnumber = $2;
$resname = $3;
$atname = $4;
if ( $resname =~ /TIP/ )
{ }
else
{
if ( $atname =~ /^H/ )
{
$res_name[$tot_protons] = $resname;
$res_numb[$tot_protons] = $resnumber;
$atm_name[$tot_protons] = $atname;
$atm_numb[$tot_protons] = $atnumber;
$tot_protons++;
}
}
}
}
print "Will prepare list of all pairs for $tot_protons protons\n";
print "If this is wrong, stop the program now ...\n";
sleep(2);
$pairs = 0;
for ( $i=0 ; $i < $tot_protons-1 ; $i++ )
{
for ( $j=$i+1 ; $j < $tot_protons ; $j++ )
{
printf OUT " % 8d % 8d ", $atm_numb[ $i ], $atm_numb[ $j ];
printf OUT "% 5s% 4d %-5s - % 5s% 4d %-5s\n", $res_name[$i], $res_numb[$i], $atm_name[$i], $res_name[$j], $res_numb[$j], $atm_name[$j];
$pairs++;
}
}
print "Wrote $pairs pairs.\n";
noe_averaging.c, a C program to calculate (r)^-3 and (r)^-6 averaged distances
/*****************************************************************************
INPUT: A 'carma.distances' file as produced by carma. If it is not a carma
formatted file, you should not use this program.
OUTPUT: Two numbers for each distance-containing column of the input
file corresponding to the (r)^-3 and (r)^-6 averaged
distances (for use with NOE data).
*****************************************************************************/
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
int main(argc,argv)
int argc;
char *argv[];
{
FILE *in;
char c;
char string[100];
int columns;
int junk;
int i;
long double r6, r3;
float val;
int nof_values;
int fraction;
double *data_r3;
double *data_r6;
setlinebuf( stdout );
in = fopen( argv[1], "r" );
if ( in == NULL )
{
printf("Can't open file %s for reading. Abort.\n", argv[1] );
exit(1);
}
/* How many columns we have on a line ? You won't believe this ... */
columns = 0;
while( fscanf(in, "%c", &c) == 1 )
{
if ( c == '\n' )
break;
if ( c == '.' )
columns++;
}
fprintf(stderr, "Found %d columns with distances.\n", columns );
fprintf(stderr, "Allocating memory ...");
data_r3 = (double *)(calloc( columns, sizeof(double)));
data_r6 = (double *)(calloc( columns, sizeof(double)));
if ( data_r3 == NULL || data_r6 == NULL )
{
printf("Failed to allocate %ld bytes of memory. Abort.\n", (long int)(2*columns*sizeof(double)));
exit(1);
}
fprintf(stderr, " done.\nNow reading line : ");
rewind( in );
nof_values = 0;
fprintf(stderr, "%8d", nof_values );
while( 1 )
{
if ( fscanf(in, "%s", &string[0]) != 1) /* skip frame number */
break;
for ( i=0 ; i < columns ; i++ )
if ( fscanf(in, "%f", &val) != 1 )
{
printf("Ooops. Corrupt input file (or not from carma ?). Abort.\n");
exit(1);
}
else
{
data_r3[i] += 1.0l / (val*val*val);
data_r6[i] += 1.0l / (val*val*val*val*val*val);
}
nof_values++;
fprintf(stderr, "HHHHHHHH%8d", nof_values ); /* These H's are actually backspaces */
}
printf("\nColumn r^-3 r^-6\n");
for ( i=0 ; i < columns ; i++ )
{
printf("% 5d % 10.6f% 10.6f \n", i+1, (double)(pow((data_r3[i]/nof_values),-1.0l/3.0l)),
(double)(pow((data_r6[i]/nof_values),-1.0l/6.0l)) );
}
}
noe_averaging.c, old (and extremely slow version)
/*****************************************************************************
INPUT: A 'carma.distances' file as produced by carma. If it is not a carma
formatted file, you should not use this program.
OUTPUT: Three numbers for each distance-containing column of the input
file. The first two columns are the (r)^-3 and (r)^-6 averaged
distances (for use with NOE data). The last column is the fraction
of total frames for which the corresponding (r) is less than 5.5A.
WHY IS IT SO SLOW: Because we do not store anything at all in memory, not
even the equivalent of one line of input. The good news
is that this program is expected to work even if you feed
it a file as big as your disk (so to speak).
*****************************************************************************/
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
int main(argc,argv)
int argc;
char *argv[];
{
FILE *in;
char c;
char string[100];
int columns;
int junk;
int now_reading;
int i;
long double r6, r3;
float val;
int nof_values;
int fraction;
setlinebuf( stdout );
in = fopen( argv[1], "r" );
if ( in == NULL )
{
printf("Can't open file %s for reading. Abort.\n", argv[1] );
exit(1);
}
/* How many columns we have on a line ? You won't believe this ... */
columns = 0;
while( fscanf(in, "%c", &c) == 1 )
{
if ( c == '\n' )
break;
if ( c == '.' )
columns++;
}
printf("Found %d columns with distances.\n", columns );
printf("We will do this the slow way ...\n");
if ( columns > 100 )
printf("and with %d columns, you'd better like it slow.\n", columns );
printf("\nColumn r^-3 r^-6 Fraction of frames with r(ij)<5.5A\n");
rewind( in );
now_reading = 1;
while ( now_reading <= columns )
{
for ( i=0 ; i < now_reading ; i++ ) /* skip unwanted columns on the first line */
fscanf(in, "%s", &string[0]);
r3 = 0.0;
r6 = 0.0;
nof_values = 0;
fraction = 0;
while( fscanf(in, "%f", &val) == 1 )
{
r3 += 1.0l / (val*val*val);
r6 += 1.0l / (val*val*val*val*val*val);
nof_values++;
if ( val <= 5.50 )
fraction++;
for ( i=0 ; i < columns ; i++ ) /* Go to next line */
fscanf(in, "%s", &string[0]);
}
printf("% 5d % 10.6f% 10.6f ", now_reading, (double)(pow((r3/nof_values),-1.0l/3.0l)),
(double)(pow((r6/nof_values),-1.0l/6.0l)) );
printf(" % 5.3f\n", (float)(fraction)/nof_values );
now_reading++;
rewind(in);
}
}
There are two basic scenarios we need to consider:
You want to compare only the observed NOE-derived distances with the calculated.
You not only want to compare the observed NOE-derived distances with the calculated, but also want to see how many of the expected (from the simulation) distances have indeed been observed. This is based on
this paper from the Gunsteren group.
For scenario (1) above, you will have to do some typing and pattern matching. Scenario (2) will require less typing but quite a lot of calculus (and can only be applied to small-to-tiny oligopeptides).
Scenario 2:
Get a copy of your peptide-only PSF file.
Use the script prep_proton.pl with your PSF to prepare an initial list of all possible proton pairs.
Note the number of pairs reported from prep_proton.pl and reflect on the following example: an 11-residue long peptide with a total of 88 hydrogens, gave a grant total of 3828 pairs (unedited result from prep_proton.pl
). Storing the distances for these pairs and for only ten thousand frames, resulted to a carma.distances
file of size 371 Mbytes (it would have been 3.7 Gbytes for 100,000 frames). The take-home message is simple: just running prep_proton.pl
, carma
, and noe_averaging
may get you in trouble. You will probably have to edit the proton list prepared from prep_proton.pl
and remove all pairs that you know that are either too far apart, or otherwise equivalent or redundant.
Now that you have your edited list of proton pairs, use it with your latest version of carma, using something like carma -v -segid A -atmid ALLID -dist proton.list my.dcd my.psf
. Carma will produce a file named carma.distances
in your current working directory.
Feed the carma.distances
file to noe_averaging
to get your r^-3 and r^-6 averaged distances.
You have your distances, but (of course) can't remember which distance is for which pair ? Fear not, prep_proton.pl
has added this information in your proton list file. Use the unix command paste
to merge the output from noe_averaging
and prep_proton.pl
. You are ready. You can even use awk
to throw away all lines with an r^-6 averaged distance of less than 5.5A.
Scenario 1:
Everything is the same as above with one notable exception: after using the prep_proton.pl
script to prepare the list of hydrogen pairs, you must remove all lines that do not have an experimentally observed distance bound. If you have a small number of pairs, it may turn out to be faster to type the list yourself.
Example
# mkdir tmp
# cd tmp
# cp /somedirectory/pept.psf .
# prep_proton.pl pept.psf
# carma -v -segid A -atmid ALLID -distance pept.protons -last 50000 my.dcd my.psf
# noe_averaging carma.distances | tee LOG
# ed LOG => remove the header lines
# paste LOG pept.protons > merged
# awk '$3 < 5.5' merged > selected
# ed selected => remove redundant entries