C5n

Community Images

To run this benchmark we fetch community images from gallery.ecr.aws/hpc.

We’ll use Thread-MPI with baked in settings of how many OpenMP threads should be spawned.

Two images with different tags:

  1. c5n_18xl_on: build for a c5n.18xllarge with hypterthreading on. This image will use 72 MPI Ranks.
  2. c5n_18xl_off: build for a c5n.18xllarge with hypterthreading off. This image will use 36 MPI Ranks.
sarus pull public.ecr.aws/hpc/spack/gromacs/2021.1/threadmpi:c5n_18xl_on
sarus pull public.ecr.aws/hpc/spack/gromacs/2021.1/threadmpi:c5n_18xl_off

36 Ranks

Our first job is aquivalent to the 36 ranks and 2 OpenMP threads, even though Thread-MPI is using process threads to mimic MPI ranks.

cat > gromacs-single-node-sarus-c5n-threadmpi-36x2.sbatch << \EOF
#!/bin/bash
#SBATCH --job-name=gromacs-single-node-sarus-c5n-threadmpi-36x2
#SBATCH --exclusive
#SBATCH --output=/fsx/logs/%x_%j.out
#SBATCH --partition=c5n

mkdir -p /fsx/jobs/${SLURM_JOBID}
export INPUT=/fsx/input/gromacs/benchRIB.tpr
sarus run --workdir=/fsx/jobs/${SLURM_JOBID} public.ecr.aws/hpc/spack/gromacs/2021.1/threadmpi:c5n_18xl_off
EOF

Let’s submit two jobs of those.

sbatch -N1 gromacs-single-node-sarus-c5n-threadmpi-36x2.sbatch
sbatch -N1 gromacs-single-node-sarus-c5n-threadmpi-36x2.sbatch

72 Ranks

Our first job is aquivalent to the 72 ranks and 1 OpenMP threads, even though Thread-MPI is using process threads to mimic MPI ranks.

cat > gromacs-single-node-sarus-c5n-threadmpi-72x1.sbatch << \EOF
#!/bin/bash
#SBATCH --job-name=gromacs-single-node-sarus-c5n-threadmpi-72x1
#SBATCH --exclusive
#SBATCH --output=/fsx/logs/%x_%j.out
#SBATCH --partition=c5n

mkdir -p /fsx/jobs/${SLURM_JOBID}
export INPUT=/fsx/input/gromacs/benchRIB.tpr
sarus run --workdir=/fsx/jobs/${SLURM_JOBID} public.ecr.aws/hpc/spack/gromacs/2021.1/threadmpi:c5n_18xl_on
EOF

Let’s submit two jobs of those.

sbatch -N1 gromacs-single-node-sarus-c5n-threadmpi-72x1.sbatch
sbatch -N1 gromacs-single-node-sarus-c5n-threadmpi-72x1.sbatch

Results

After those runs are done, we grep the performance results.

grep -B2 Performance /fsx/logs/gromacs-single-node-sarus-g4dn-cuda-tmpi-*

This extends the table from the gromacs-on-pcluster workshop started with decomposition.

# execution spec instance Ranks x Threads ns/day
1 native gromacs@2021.1 c5n.18xl 18 x 4 4.7
2 native gromacs@2021.1 c5n.18xl 36 x 2 5.3
3 native gromacs@2021.1 c5n.18xl 72 x 1 5.5
4 native gromacs@2021.1 ^intel-mkl c5n.18xl 36 x 2 5.4
5 native gromacs@2021.1 ^intel-mkl c5n.18xl 72 x 1 5.5
6 native gromacs@2021.1 ~mpi c5n.18xl 36 x 2 5.5
7 native gromacs@2021.1 ~mpi c5n.18xl 72 x 1 5.7
8 native gromacs@2021.1 +cuda ~mpi g4dn.8xl 1 x 32 6.3
9 sarus gromacs@2021.1 ~mpi c5n.18xl 36 x 2 5.5
10 sarus gromacs@2021.1 ~mpi c5n.18xl 72 x 1 5.7

Please note, the containerized run yield the same result as the native run!