CUDA w/ Thread-MPI

CUDA

Next, we are going to use the CUDA.

salloc -p g4dn -N1 --exclusive

Once the job is Running, we are going to log in.

ssh $SLURM_NODELIST
spack env list
spack env activate aws
spack env list

We are using spack to install GROMACS without (~) using an MPI.

time spack install --no-check-signature --no-checksum gromacs@2021.1 +cuda ~mpi

It is able to reuse a lot of the packages we installed earlier.

The installation took about 5min. Again, we are capturing the hash.

read -p "Please paste the hash: " GROMACS_CUDA_THREADMPI_HASH

We commit the variable to our environment in case we need to log in again.

echo "export GROMACS_CUDA_THREADMPI_HASH=${GROMACS_CUDA_THREADMPI_HASH}" |tee -a ~/.bashrc

Now we release the allocation:

exit

32 Threads

source ~/.bashrc
cat > gromacs-single-node-g4dn-cuda-tmpi-1x32.sbatch << \EOF
#!/bin/bash
#SBATCH --job-name=gromacs-single-node-g4dn-cuda-tmpi-1x32
#SBATCH --exclusive
#SBATCH --output=/fsx/logs/%x_%j.out
#SBATCH --partition=g4dn
NRANKS=1
NTOMP=32

mkdir -p /fsx/jobs/${SLURM_JOBID}
cd /fsx/jobs/${SLURM_JOBID}

spack env activate aws
echo ">>> spack load /GROMACS_CUDA_THREADMPI_HASH"
spack load /GROMACS_CUDA_THREADMPI_HASH

set -x
time gmx mdrun -pme cpu -ntmpi ${NRANKS} -ntomp ${NTOMP} -s /fsx/input/gromacs/benchRIB.tpr -resethway
EOF
sed -i -e 's/GROMACS_CUDA_THREADMPI_HASH/'${GROMACS_CUDA_THREADMPI_HASH}'/' gromacs-single-node-g4dn-cuda-tmpi-1x32.sbatch
sbatch gromacs-single-node-g4dn-cuda-tmpi-1x32.sbatch
sbatch gromacs-single-node-g4dn-cuda-tmpi-1x32.sbatch

Results

After those runs are done, we grep the performance results.

grep -B2 Performance /fsx/logs/gromacs-single-node-g4dn-cuda-tmpi-1x32*

This extends the table started with decomposition.

# execution spec instance Ranks x Threads ns/day
1 native gromacs@2021.1 c5n.18xl 18 x 4 4.7
2 native gromacs@2021.1 c5n.18xl 36 x 2 5.3
3 native gromacs@2021.1 c5n.18xl 72 x 1 5.5
4 native gromacs@2021.1 ^intel-mkl c5n.18xl 36 x 2 5.4
5 native gromacs@2021.1 ^intel-mkl c5n.18xl 72 x 1 5.5
6 native gromacs@2021.1 ~mpi c5n.18xl 36 x 2 5.5
7 native gromacs@2021.1 ~mpi c5n.18xl 72 x 1 5.7
8 native gromacs@2021.1 +cuda ~mpi g4dn.8xl 1 x 32 6.3