The idea is to use the information obtained from a cluster analysis performed with carma to calculate (a) the frequencies of transitions between distinct clusters, and, (b) to painlessly prepare a plot illustrating these transitions.
The following is based on using the carma.clusters.dat
file (or equivalent from grcarma) to (a) calculate the frequencies in the form of an adjacency matrix, and, (b) use the online server at http://mkweb.bcgsc.ca/tableviewer/ to prepare a circos plot. The code (at the end of this page) is straightforward, just count the number of times you observe a transition between two different clusters. First an example of using the program, then the code.
~> head dPCA.clusters.dat 1 6 -1.4581553 1.4341596 0.3003940 2 6 -1.4706465 1.4843552 0.2985485 3 6 -1.3735210 1.2062018 0.4238326 4 6 -1.4095449 1.2722338 0.3965123 7 6 -1.3537384 1.3221643 0.3482791 8 6 -1.3943268 1.3224407 0.3905717 9 6 -1.4561417 1.3272395 0.3602288 10 6 -1.3478192 1.2956438 0.4218862 11 6 -1.4437813 1.3226526 0.3328353 13 6 -1.4137516 1.2818775 0.3312682 ~> ~> wc dPCA.clusters.dat 2894228 14471140 150499856 dPCA.clusters.dat ~> ~> ./a.out < dPCA.clusters.dat 1 828965 2 627297 3 360754 4 456045 5 310783 6 144079 7 128427 8 37878 data A B C D E F G H A 0 99 1 0 297 0 9 13 B 91 0 240 58 12 1 0 0 C 6 238 0 1 289 12 1 0 D 1 56 2 0 0 116 1 6 E 307 8 285 0 0 1 7 0 F 0 0 17 118 1 0 56 2 G 5 0 2 2 9 62 0 42 H 9 1 0 3 0 2 48 0 ~>
The first part of the output is (cluster number vs. size). The second part is the transition matrix. Be careful : the transitions A ⇒ B and B ⇒ A are not equivalent (and, consequently, the matrix is not symmetric). The from cluster is shown in the first column of the matrix, the to cluster is shown in the first row. For example, the E ⇒ A transition was observed 307 times, the A ⇒ E transition 297.
To plot the data, save the matrix to a file, go to http://mkweb.bcgsc.ca/tableviewer/ , feed the table, and enjoy the result :
A couple of things you should note :
And, finally, the code [and, yes, I do love gets()] :
#include <stdio.h> #include <stdlib.h> int size[100]; int trans[100][100]; main() { char s[5000]; int frame, cluster; int i, k; int previous=-1; int max; while( gets( s ) != NULL ) { if ( sscanf( s, "%d %d", &frame, &cluster) != 2 ) { printf("Malformed input file ??? Abort.\n"); exit( 1 ); } size[ cluster ]++; if ( previous < 0 ) { previous = cluster; } if ( previous != cluster ) { trans[ previous ][ cluster ]++; previous = cluster; } } max = 0; for ( i = 0 ; i < 100 ; i++ ) if ( size[ i ] > 0 ) { printf("%3d %6d\n", i, size[ i ] ); max = i; } printf("\n"); printf(" data "); for ( i=1 ; i <= max ; i++ ) printf("%7c ", 64+i ); printf("\n"); for ( i=1 ; i <= max ; i++ ) { printf("%7c ", 64+i ); for ( k=1 ; k <= max ; k++ ) printf("%7d ", trans[ i ][ k ] ); printf("\n"); } }