The ApoA1 benchmark is as distributed by NAMD developers. The tests were performed on identical machines based on intel's Q6600 with nvidia's GTX460 on a gigabit interconnect. All measurements are in nanoseconds per day.
Nodes/Cores/GPUs | With CUDA | Without CUDA | Times faster with CUDA |
---|---|---|---|
1 / 4 / 1 | 1.11 | 0.22 | 5.04 |
2 / 8 / 2 | 1.79 | 0.42 | 4.25 |
4 / 16 / 4 | 2.27 | 0.77 | 2.94 |
For comparison, the whole quad cluster (8 nodes, 32 cores, no CUDA) is producing 1.21 nanoseconds per day, whereas a single i7+GTX295 alone produces 1.73 nanoseconds per day.
For small (~5-6K atoms) systems you will have to stick to a single node. Notably, and due to the small number of atoms, it will not make any difference whether you work with the i7+GTX295 or with a Q6600+GTX460 node : your simulation will plateau at ~26 nanoseconds per day, which is approximately 30% faster from what you would have gotten from two Q6600 nodes without GPUs (or, 70% faster than a single i7 box without GPU).
For a slightly larger, 12,000-atom system, the numbers (in nanoseconds per day) are :
Nodes/Cores/GPUs | With CUDA | Without CUDA | Times faster with CUDA |
---|---|---|---|
1 / 4 / 1 | 14.7 | 7.57 | 1.94 |
2 / 8 / 2 | 20.0 | 12.20 | 1.64 |
4 / 16 / 4 | 9.90 |
For a system with 26,000 atoms, and using a set of 10-12-14 cutoffs + 2-1-2 steps, two nodes (with CUDA) give 8 ns/day.