I downloaded and built gromacs from www.gromacs.org and used the 5.1.5 version because some of the tutorials/benchmarks still refer to this older version. This post documents steps I used to run gromacs 5.1.5 on a Ryzen processor.
After running the executable, I got an SIGILL instruction and looking further, it seems the 5.1.5 code explicitly adds -mxop and -mfma4 options to the build if it detects an AMD platform. Unfortunately, the XOP and FMA4 ISAs are two that were no longer available in Ryzen.
My next attempt is to set CFLAGS and CXXFLAGS variables on the command line to explicitly reset these options, namely:
CFLAGS=-mno-fma4 -mno-xop -mfma CXXFLAGS=-mno-fma4 -mno-xop -mfma
Unfortunately, this still results in a compile error:
/usr/lib/gcc/x86_64-linux-gnu/7/include/fma4intrin.h:91:1: error: inlining failed in call to always_inline ‘__m128 _mm_nmacc_ps(__m128, __m128, __m128)’: target specific option mismatch _mm_nmacc_ps (__m128 __A, __m128 __B, __m128 __C) ^~~~~~~~~~~~ In file included from /work/source/gromacs-5.1.5/src/gromacs/listed-forces/bonded.h:59:0, from /work/source/gromacs-5.1.5/src/gromacs/listed-forces/bonded.cpp:48: /work/source/gromacs-5.1.5/src/gromacs/pbcutil/pbc-simd.h:129:28: note: called from here *dx = gmx_simd_fnmadd_r(shx, pbc->bxx, *dx); src/gromacs/CMakeFiles/libgromacs.dir/build.make:525: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/listed-forces/bonded.cpp.o' failed
So even adding in CFLAGS options, gromacs still seems to want to configure FMA4.
So my next approach was looking in the installation guide here and passing in a cmake option -DGMX_SIMD=AVX2_256.
This seems to get past the SIGILL instruction fault.
An additional item I notice is when the topology gets determined, the 4-core 8 HT Haswell system gets allocated to 1 MPI rank and 8 OMP threads while the AMD 8-core 16 HT system gets allocated 16 MPI ranks with 1 OMP thread.
Running on 1 node with total 16 cores, 16 logical cores Hardware detected: CPU info: Vendor: AuthenticAMD Brand: AMD Ryzen 7 1700 Eight-Core Processor SIMD instructions most likely to fit this hardware: AVX_128_FMA SIMD instructions selected at GROMACS compile time: AVX2_256 Compiled SIMD instructions: AVX2_256, GROMACS could use AVX_128_FMA on this machine, which is better Reading file nvt.tpr, VERSION 5.1.5 (single precision) Changing nstlist from 10 to 20, rlist from 1 to 1.029 Using 16 MPI threads Using 1 OpenMP thread per tMPI thread
I tried changing this to one MPI rank with “-ntmpi” and received an error
Fatal error: Your choice of 1 MPI rank and the use of 16 total threads leads to the use of 16 OpenMP threads, whereas we expect the optimum to be with more MPI ranks wit\ h 1 to 6 OpenMP threads. If you want to run with this many OpenMP threads, specify the -ntomp option. But we suggest to increase the number of MPI ranks (op\ tion -ntmpi). For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors
Also note that the performance differences between the (thread)MPI and OpenMP may not be as great. gromacs logfile prints statistics at the end and 16 OpenMP threads is:
Core t (s) Wall t (s) (%) Time: 56130.787 3516.059 1596.4 58:36 (ns/day) (hour/ns) Performance: 24.573 0.977
while 16 MPI is
Core t (s) Wall t (s) (%) Time: 55557.516 3479.183 1596.9 57:59 (ns/day) (hour/ns) Performance: 24.833 0.966