about:benchmarks:namd100k

NAMD, 100,000 atoms benchmarks

This is a 99,744 atom system with a PME grid of 112x108x108 (script included below). For all the tests that follow we used the NAMD 2.6 amd64 executable as provided by the NAMD developers.

NAMD script used for these tests

# Input files
#
structure               ionized.psf
coordinates             heat_out.coor
velocities              heat_out.vel
extendedSystem          heat_out.xsc
parameters              par_all27_prot_na.inp
paraTypeCharmm          on

#
# Output files & writing frequency for DCD
# and restart files
#
outputname              output/equi_out
binaryoutput            off
restartname             output/restart
restartfreq             1000
binaryrestart           yes
dcdFile                 output/equi_out.dcd
dcdFreq                 200
DCDunitcell             on

#
# Frequencies for logs and the xst file
#
outputEnergies          20
outputTiming            200
xstFreq                 200

#
# Timestep & friends
#
timestep                2.0
stepsPerCycle           8
nonBondedFreq           2
fullElectFrequency      4

#
# Simulation space partitioning
#
switching               on
switchDist              10
cutoff                  12
pairlistdist            13.5

#
# Basic dynamics
#
COMmotion               no
dielectric              1.0
exclude                 scaled1-4
1-4scaling              1.0
rigidbonds              all

#
# Particle Mesh Ewald parameters. 
#
Pme                     on
PmeGridsizeX            112                      # <===== CHANGE ME
PmeGridsizeY            108                     # <===== CHANGE ME
PmeGridsizeZ            108                     # <===== CHANGE ME
# Pmeprocessors         8

#
# Periodic boundary things
#
wrapWater               on
wrapNearest             on
wrapAll                 on


#
# Langevin dynamics parameters
#
langevin                on
langevinDamping         1
langevinTemp            298                     # <===== Check me
langevinHydrogen        on

langevinPiston          on
langevinPistonTarget    1.01325
langevinPistonPeriod    200
langevinPistonDecay     100
langevinPistonTemp      298                     # <===== Check me

useGroupPressure        yes

firsttimestep           26000                     # <===== CHANGE ME
run                     25000000                   ;# <===== CHANGE ME

	1 core	4 cores	8 cores	16 cores	32 cores
Days per nsec	8.35	2.50	1.40	0.89	0.60
nsec per day	0.12	0.40	0.71	1.12	1.66
Efficiency	100%	83%	75%	59%	44%

Now try the following: instead of 'filling-up' all four cores of each node, distribute the work to different nodes (applicable only if less than 32 cores are needed). The following .nodelist file is one solution:

Modified .nodelist file

group main
host 10.0.1.11
host 10.0.1.12
host 10.0.1.13
host 10.0.1.14
host 10.0.1.15
host 10.0.1.16
host 10.0.1.17
host 10.0.1.18
host 10.0.0.11
host 10.0.0.12
host 10.0.0.13
host 10.0.0.14
host 10.0.0.15
host 10.0.0.16
host 10.0.0.17
host 10.0.0.18
host 10.0.1.11
host 10.0.1.12
host 10.0.1.13
host 10.0.1.14
host 10.0.1.15
host 10.0.1.16
host 10.0.1.17
host 10.0.1.18
host 10.0.0.11
host 10.0.0.12
host 10.0.0.13
host 10.0.0.14
host 10.0.0.15
host 10.0.0.16
host 10.0.0.17
host 10.0.0.18

Using this nodelist file and repeating the measurements, we have:

	1 core	4 cores	8 cores	16 cores	32 cores
Days per nsec	8.35	2.25	1.19	0.84	0.65
nsec per day	0.12	0.44	0.84	1.19	1.54
Efficiency	100%	93%	88%	62%	21%

which means that for anything up-to and including 16 cores, you are better-off with the nodelist file shown above.

Finally, an attempt to try filling-in pairs of cores before moving to the next node (group main host 10.0.0.11 host 10.0.1.11 host 10.0.0.12 host 10.0.1.12 host 10.0.0.13 host 10.0.1.13 host 10.0.0.14 host 10.0.1.14 host 10.0.0.15 host 10.0.1.15 host 10.0.0.16 host 10.0.1.16 host 10.0.0.17 host 10.0.1.17 host 10.0.0.18 host 10.0.1.18 host 10.0.0.11 host 10.0.1.11 …) gave worst scaling than the previously mentioned solution.