Conclusion

We tested GROMACS with different dependencies using Spack and a couple of runtime configurations like the number of ranks and thread.

We reached 5.7 ns/day for C5n.18xlarge instances. A 20% (1 ns/day) improvment over what we started out with.

# execution spec instance Ranks x Threads ns/day
1 native gromacs@2021.1 c5n.18xl 18 x 4 4.7
2 native gromacs@2021.1 c5n.18xl 36 x 2 5.3
3 native gromacs@2021.1 c5n.18xl 72 x 1 5.5
4 native gromacs@2021.1 ^intel-mkl c5n.18xl 36 x 2 5.4
5 native gromacs@2021.1 ^intel-mkl c5n.18xl 72 x 1 5.5
6 native gromacs@2021.1 ~mpi c5n.18xl 36 x 2 5.5
7 native gromacs@2021.1 ~mpi c5n.18xl 72 x 1 5.7

Afterwards moving to g4dn instances and how we can improve the price/performance by utilizing dense compute.

# execution spec instance Ranks x Threads ns/day
8 native gromacs@2021.1 +cuda ~mpi g4dn.8xl 1 x 32 6.3

And eventually we showed that containerization is not a thread to performance, quite the oposite. It provides us with reproducibility and ease of deployment.

# execution spec instance Ranks x Threads ns/day
9 sarus gromacs@2021.1 ~mpi c5n.18xl 36 x 2 5.5
10 sarus gromacs@2021.1 ~mpi c5n.18xl 72 x 1 5.7
11 sarus gromacs@2021.1 +cuda ~mpi fftw precision=float g4dn.8xl 1 x 32 6.3

In the single-node benchmark chapter we experimented with different gromacs installations and decompositions.