Calculate and plot frequencies of transitions between conformers

The idea is to use the information obtained from a cluster analysis performed with carma to calculate (a) the frequencies of transitions between distinct clusters, and, (b) to painlessly prepare a plot illustrating these transitions.

The following is based on using the carma.clusters.dat file (or equivalent from grcarma) to (a) calculate the frequencies in the form of an adjacency matrix, and, (b) use the online server at http://mkweb.bcgsc.ca/tableviewer/ to prepare a circos plot. The code (at the end of this page) is straightforward, just count the number of times you observe a transition between two different clusters. First an example of using the program, then the code.

~> head dPCA.clusters.dat
       1   6   -1.4581553    1.4341596    0.3003940
       2   6   -1.4706465    1.4843552    0.2985485
       3   6   -1.3735210    1.2062018    0.4238326
       4   6   -1.4095449    1.2722338    0.3965123
       7   6   -1.3537384    1.3221643    0.3482791
       8   6   -1.3943268    1.3224407    0.3905717
       9   6   -1.4561417    1.3272395    0.3602288
      10   6   -1.3478192    1.2956438    0.4218862
      11   6   -1.4437813    1.3226526    0.3328353
      13   6   -1.4137516    1.2818775    0.3312682
~>
~> wc dPCA.clusters.dat
  2894228  14471140 150499856 dPCA.clusters.dat
~>
~> ./a.out < dPCA.clusters.dat
  1 828965
  2 627297
  3 360754
  4 456045
  5 310783
  6 144079
  7 128427
  8  37878
 
   data       A       B       C       D       E       F       G       H
      A       0      99       1       0     297       0       9      13
      B      91       0     240      58      12       1       0       0
      C       6     238       0       1     289      12       1       0
      D       1      56       2       0       0     116       1       6
      E     307       8     285       0       0       1       7       0
      F       0       0      17     118       1       0      56       2
      G       5       0       2       2       9      62       0      42
      H       9       1       0       3       0       2      48       0
~>

The first part of the output is (cluster number vs. size). The second part is the transition matrix. Be careful : the transitions A ⇒ B and B ⇒ A are not equivalent (and, consequently, the matrix is not symmetric). The from cluster is shown in the first column of the matrix, the to cluster is shown in the first row. For example, the E ⇒ A transition was observed 307 times, the A ⇒ E transition 297.

To plot the data, save the matrix to a file, go to http://mkweb.bcgsc.ca/tableviewer/ , feed the table, and enjoy the result :

A couple of things you should note :

The proportions of the clusters in the plot depends on the total number of transitions observed for each cluster and not on the size of the cluster. You can change that if you want by adding manually an extra column and row with the cluster sizes as shown in 'Sample 7' of http://mkweb.bcgsc.ca/tableviewer/samples

You can change the colors as described in 'Samples 9 & 10' of http://mkweb.bcgsc.ca/tableviewer/samples

And, finally, the code [and, yes, I do love gets()] :

#include <stdio.h>
#include <stdlib.h>
 
 
int size[100];
int trans[100][100];
 
main()
{
    char    s[5000];
    int     frame, cluster;
    int     i, k;
    int     previous=-1;
    int	    max;
 
    while( gets( s ) != NULL )
    {
        if ( sscanf( s, "%d %d", &frame, &cluster) != 2 )
        {
            printf("Malformed input file ??? Abort.\n");
            exit( 1 );
        }
 
        size[ cluster ]++;
 
        if ( previous < 0 )
        {
            previous = cluster;
        }
 
        if ( previous != cluster )
        {
            trans[ previous ][ cluster ]++;
            previous = cluster;
        }
 
    }
 
    max = 0;
    for ( i = 0 ; i < 100 ; i++ )
        if ( size[ i ] > 0 )
        {
            printf("%3d %6d\n", i, size[ i ] );
            max = i;
        }
 
    printf("\n");
 
    printf("   data ");
    for ( i=1 ; i <= max ; i++ )
        printf("%7c ", 64+i );
    printf("\n");
 
    for ( i=1 ; i <= max ; i++ )
    {
        printf("%7c ", 64+i );
        for ( k=1 ; k <= max ; k++ )
            printf("%7d ", trans[ i ][ k ] );
        printf("\n");
 
    }
 
}