gromacs – setup notes
This post documents steps I used to build, install and run gromacs.
gromacs is a molecular dynamics package for simulating proteins, lipids and nucleic acids. It seems to be one of the more common HPC applications. My interest is less in the computational chemistry and more in characterizing how the program uses microprocessor resources.
The gromacs web site is www.gromacs.org.
Building gromacs was pretty easy. The is documentation but it was mostly a download, cmake and install process. I built two versions (a) the most recent 2018.1 and (b) the last of the 5.1 series, 5.1.5. The reason for both versions is there seem to be some interface changes and this might better let me try tutorials and benchmarks with the 5.1x version first.
The following command sets up my environment from my alternate build location:
source /work/sandbox/gromacs-5.1.5/bin/GMXRC.bash
The tutorial page contains multiple examples from which I started with the Lysozyme in Water tutorial.
One of the first steps was to go to the Protein Data Bank and download the information for the hen egg white lysozyme. I then used “pymol” program on ubuntu to visualize this molecule.
After this, I stepped through the various transformation examples given in the tutorial. Included in these were four more computation-heavy steps that called “gmx mdrun”. Below is a summary of running the “topdown” tool to characterize these runs:
gmx mdrun -deffnm em on_cpu 0.946 elapsed 17.080 utime 129.163 stime 0.100 nvcsw 694 (13.93%) nivcsw 4287 (86.07%) inblock 0 onblock 4792 retire 0.520 ms_uops 0.013 speculation 0.096 branch_misses 91.60% machine_clears 8.40% frontend 0.194 idq_uops_delivered_0 0.053 icache_stall 0.003 itlb_misses 0.000 idq_uops_delivered_1 0.077 idq_uops_delivered_2 0.111 idq_uops_delivered_3 0.148 dsb_ops 57.64% backend 0.190 resource_stalls.sb 0.028 stalls_ldm_pending 0.199 l2_refs 0.018 l2_misses 0.004 l2_miss_ratio 23.06% l3_refs 0.002 l3_misses 0.000 l3_miss_ratio 24.19%
Quick only 17 seconds and 95% On_CPU with ~9% lost due to branch misses, 19% front-end (mostly inefficient use of 4 uops) and 19% backend mostly memory.
gmx mdrun -deffnm nvt on_cpu 0.996 elapsed 371.606 utime 2960.924 stime 1.512 nvcsw 1649 (8.74%) nivcsw 17222 (91.26%) inblock 0 onblock 167128 retire 0.619 ms_uops 0.007 speculation 0.017 branch_misses 77.12% machine_clears 22.88% frontend 0.122 idq_uops_delivered_0 0.053 icache_stall 0.004 itlb_misses 0.000 idq_uops_delivered_1 0.057 idq_uops_delivered_2 0.063 idq_uops_delivered_3 0.072 dsb_ops 84.24% backend 0.242 resource_stalls.sb 0.036 stalls_ldm_pending 0.171 l2_refs 0.031 l2_misses 0.006 l2_miss_ratio 19.41% l3_refs 0.002 l3_misses 0.001 l3_miss_ratio 28.82%
Over 6 minutes with almost 100% On_CPU. A higher retirement rate with only 2% branch misses and lower frontend stalls and slightly higher backend stalls with more memory operations.
gmx mdrun -deffnm npt on_cpu 0.996 elapsed 387.031 utime 3081.660 stime 1.300 nvcsw 2430 (12.13%) nivcsw 17609 (87.87%) inblock 0 onblock 167160 retire 0.621 ms_uops 0.008 speculation 0.017 branch_misses 68.63% machine_clears 31.37% frontend 0.126 idq_uops_delivered_0 0.055 icache_stall 0.004 itlb_misses 0.000 idq_uops_delivered_1 0.059 idq_uops_delivered_2 0.065 idq_uops_delivered_3 0.074 dsb_ops 84.59% backend 0.236 resource_stalls.sb 0.033 stalls_ldm_pending 0.165 l2_refs 0.030 l2_misses 0.006 l2_miss_ratio 20.13% l3_refs 0.002 l3_misses 0.000 l3_miss_ratio 24.33%
Over 6 minutes with 2% branch misses and lower frontend stalls and higher backend stalls. The uop cache provides 85% of the total uops.
gmx mdrun -deffnm md_0_1 on_cpu 0.999 elapsed 3918.340 utime 31292.743 stime 13.975 nvcsw 5907 (3.88%) nivcsw 146201 (96.12%) inblock 0 onblock 207376 retire 0.610 ms_uops 0.007 speculation 0.016 branch_misses 72.60% machine_clears 27.40% frontend 0.124 idq_uops_delivered_0 0.054 icache_stall 0.003 itlb_misses 0.000 idq_uops_delivered_1 0.058 idq_uops_delivered_2 0.064 idq_uops_delivered_3 0.072 dsb_ops 84.67% backend 0.250 resource_stalls.sb 0.036 stalls_ldm_pending 0.179 l2_refs 0.030 l2_misses 0.006 l2_miss_ratio 20.48% l3_refs 0.002 l3_misses 0.001 l3_miss_ratio 27.15%
More than an hour with similar retirement rate as before and 2% branch misses and 85% uop cache usage and a moderate amount of backend memory stalls.
From here, the next steps are to do slightly more complete analysis similar to that done for other benchmarks and then to try some other tutorial/benchmark inputs to see how gromacs behaves for molecules with different topologies.
Comments
gromacs – setup notes — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>