Running GROMACS with Wi4MPI

Learning objectives

With these hands-on exercises, participants will learn:

How to switch between MPI implementation at runtime thanks to Wi4MPI on a real HPC application.
How to execute a benchmark and measure performance with a common HPC benchmark.

What is GROMACS?

GROMACS is a versatile package to perform molecular dynamics, i.e., simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions. Still, since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations), it is also used for research on non-biological systems, e.g., polymers and fluid dynamics.

GROMACS is a molecular dynamics application designed to simulate Newtonian equations of motion for systems with hundreds to millions of particles. GROMACS is designed to simulate biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions.

Obtaining the benchmark

wget https://www.mpibpc.mpg.de/15101317/benchMEM.zip
unzip benchMEM.zip

This benchmark simulates a membrane channel protein embedded in a lipid bilayer surrounded by water and ions. With its size of ~80,000 atoms, it serves as a prototypical example for a large class of setups used to study all kinds of membrane-embedded proteins. For some more information, see here.

First execution

You can load the default GROMACS version installed in your environment and launch it:

Load GROMACS:

spack load gromacs

Launch the test case:

export OMP_NUM_THREADS=2
srun -n 32 -c 2 gmx_mpi mdrun -v -resethway -nsteps 10000 -ntomp ${OMP_NUM_THREADS} -s benchMEM.tpr

There are some parameters that can be modified when launching GROMACS:

-nsteps specifies the number of steps and can be used to run a more or less long benchmark.
-resethway resets the performance timers halfway through the run, removing the overhead of initialization and load balancing from the timings.
-ntomp defines the number of OpenMP threads used.

You can expect an output that looks like this:

10000 steps,     20.0 ps.
step 0
step 100, remaining wall clock time:    25 s
...
step 9800, remaining wall clock time:     0 s
step 9900, remaining wall clock time:     0 s
vol 0.70  imb F  1% pme/F 0.70
step 10000, remaining wall clock time:     0 s

Core t (s)   Wall t (s)        (%)
Time:      721.804       11.279     6399.6
(ns/day)    (hour/ns)
Performance:       76.618        0.313

You can note the performance obtained, in ns/day.

Switching MPI with Wi4MPI

You can now easily switch between MPI versions with Wi4MPI, for example, switching to Open MPI:

spack unload -a
spack load openmpi
spack load wi4mpi
spack load gromacs

export LD_LIBRARY_PATH=${WI4MPI_ROOT}/lib:${LD_LIBRARY_PATH}
export WI4MPI_TO=OMPI
export WI4MPI_RUN_MPI_C_LIB=/path/to/openmpi/lib/libmpi.so
export WI4MPI_RUN_MPI_F_LIB=/path/to/openmpi/lib/libmpi_mpifh.so
export WI4MPI_RUN_MPIIO_C_LIB=${WI4MPI_RUN_MPI_C_LIB}
export WI4MPI_RUN_MPIIO_F_LIB=${WI4MPI_RUN_MPI_F_LIB}
export WI4MPI_WRAPPER_LIB=/path/to/wi4mpi/lib_${WI4MPI_TO}/libwi4mpi_${WI4MPI_TO}.so

And then run the app:

export OMP_NUM_THREADS=2
srun -n 32 -c 2 gmx_mpi mdrun -v -resethway -nsteps 10000 -ntomp ${OMP_NUM_THREADS} -s benchMEM.tpr

You can note the performance, in ns/day.