gromacs – setup notes – Performance analysis, tools and experiments

This post documents steps I used to build, install and run gromacs.

gromacs is a molecular dynamics package for simulating proteins, lipids and nucleic acids. It seems to be one of the more common HPC applications. My interest is less in the computational chemistry and more in characterizing how the program uses microprocessor resources.

The gromacs web site is www.gromacs.org.

Building gromacs was pretty easy. The is documentation but it was mostly a download, cmake and install process. I built two versions (a) the most recent 2018.1 and (b) the last of the 5.1 series, 5.1.5. The reason for both versions is there seem to be some interface changes and this might better let me try tutorials and benchmarks with the 5.1x version first.

The following command sets up my environment from my alternate build location:

source /work/sandbox/gromacs-5.1.5/bin/GMXRC.bash

The tutorial page contains multiple examples from which I started with the Lysozyme in Water tutorial.

One of the first steps was to go to the Protein Data Bank and download the information for the hen egg white lysozyme. I then used “pymol” program on ubuntu to visualize this molecule.

After this, I stepped through the various transformation examples given in the tutorial. Included in these were four more computation-heavy steps that called “gmx mdrun”. Below is a summary of running the “topdown” tool to characterize these runs:

gmx mdrun -deffnm em

on_cpu         0.946
elapsed        17.080
utime          129.163
stime          0.100
nvcsw          694 (13.93%)
nivcsw         4287 (86.07%)
inblock        0
onblock        4792
retire         0.520
ms_uops                0.013
speculation    0.096
branch_misses          91.60%
machine_clears         8.40%
frontend       0.194
idq_uops_delivered_0   0.053
icache_stall               0.003
itlb_misses                0.000
idq_uops_delivered_1   0.077
idq_uops_delivered_2   0.111
idq_uops_delivered_3   0.148
dsb_ops                    57.64%
backend        0.190
resource_stalls.sb     0.028
stalls_ldm_pending     0.199
l2_refs                    0.018
l2_misses                  0.004
l2_miss_ratio              23.06%
l3_refs                    0.002
l3_misses                  0.000
l3_miss_ratio              24.19%

Quick only 17 seconds and 95% On_CPU with ~9% lost due to branch misses, 19% front-end (mostly inefficient use of 4 uops) and 19% backend mostly memory.

gmx mdrun -deffnm nvt

on_cpu         0.996
elapsed        371.606
utime          2960.924
stime          1.512
nvcsw          1649 (8.74%)
nivcsw         17222 (91.26%)
inblock        0
onblock        167128
retire         0.619
ms_uops                0.007
speculation    0.017
branch_misses          77.12%
machine_clears         22.88%
frontend       0.122
idq_uops_delivered_0   0.053
icache_stall               0.004
itlb_misses                0.000
idq_uops_delivered_1   0.057
idq_uops_delivered_2   0.063
idq_uops_delivered_3   0.072
dsb_ops                    84.24%
backend        0.242
resource_stalls.sb     0.036
stalls_ldm_pending     0.171
l2_refs                    0.031
l2_misses                  0.006
l2_miss_ratio              19.41%
l3_refs                    0.002
l3_misses                  0.001
l3_miss_ratio              28.82%

Over 6 minutes with almost 100% On_CPU. A higher retirement rate with only 2% branch misses and lower frontend stalls and slightly higher backend stalls with more memory operations.

gmx mdrun -deffnm npt

on_cpu         0.996
elapsed        387.031
utime          3081.660
stime          1.300
nvcsw          2430 (12.13%)
nivcsw         17609 (87.87%)
inblock        0
onblock        167160
retire         0.621
ms_uops                0.008
speculation    0.017
branch_misses          68.63%
machine_clears         31.37%
frontend       0.126
idq_uops_delivered_0   0.055
icache_stall               0.004
itlb_misses                0.000
idq_uops_delivered_1   0.059
idq_uops_delivered_2   0.065
idq_uops_delivered_3   0.074
dsb_ops                    84.59%
backend        0.236
resource_stalls.sb     0.033
stalls_ldm_pending     0.165
l2_refs                    0.030
l2_misses                  0.006
l2_miss_ratio              20.13%
l3_refs                    0.002
l3_misses                  0.000
l3_miss_ratio              24.33%

Over 6 minutes with 2% branch misses and lower frontend stalls and higher backend stalls. The uop cache provides 85% of the total uops.

gmx mdrun -deffnm md_0_1
on_cpu         0.999
elapsed        3918.340
utime          31292.743
stime          13.975
nvcsw          5907 (3.88%)
nivcsw         146201 (96.12%)
inblock        0
onblock        207376
retire         0.610
ms_uops                0.007
speculation    0.016
branch_misses          72.60%
machine_clears         27.40%
frontend       0.124
idq_uops_delivered_0   0.054
icache_stall               0.003
itlb_misses                0.000
idq_uops_delivered_1   0.058
idq_uops_delivered_2   0.064
idq_uops_delivered_3   0.072
dsb_ops                    84.67%
backend        0.250
resource_stalls.sb     0.036
stalls_ldm_pending     0.179
l2_refs                    0.030
l2_misses                  0.006
l2_miss_ratio              20.48%
l3_refs                    0.002
l3_misses                  0.001
l3_miss_ratio              27.15%

More than an hour with similar retirement rate as before and 2% branch misses and 85% uop cache usage and a moderate amount of backend memory stalls.

From here, the next steps are to do slightly more complete analysis similar to that done for other benchmarks and then to try some other tutorial/benchmark inputs to see how gromacs behaves for molecules with different topologies.

Performance analysis, tools and experiments

An eclectic collection

gromacs – setup notes

Comments

gromacs – setup notes — No Comments

Leave a Reply Cancel reply