Skip to main content

新页面

4. Gromacs-2021.3 (比较GPU队列)

说明:

使用HPCC的Singularity容器解决方案,调用/fs00/software/singularity-images/ngc_gromacs_2021.3.sif完成Gromacs的能量最小化(em)、平衡模拟(nvt、npt)以及成品模拟(md)在公共共享100%GPU队列722080tiib、72rtxib、83a100ib、723090ib的表现。

队列情况:

队列节点数每节点CPU每节点内存(GB)平均每核内存(GB)CPU主频(GHz)每节点GPU数量每GPU显存(GB)**浮点计算理论峰值(TFLOPS)
83a100ib16451282.6840双精度:82.9248
---------------------------
723090ib24851210.72.8824双精度:4.3008单精度:569.280
722080tiib4161288.03.0411双精度:3.072单精度:215.168
72rtxib3161288.03.0424双精度:2.304单精度:195.744

前人关于Gromacs-2021.3(全部相互作用用GPU计算)的测试报告中,尝试用GPU来模拟102808个原子体系(464 residues, 9nt DNA, 31709 SOL, 94 NA, 94 CL)50 ns内所有相互作用的运算,结果表明83a100ib(250 ns/day以上)>723090ib(220 ns/day以上)>722080tiib(170 ns/day以上)>72rtxib(180 ns/day以上),但83a100ib和723090ib队列常年存在80以上的NJOBS,因此作为成品模拟的前期准备,通常不使用这两个队列。

文件位置:

/fs00/software/singularity-images/ngc\_gromacs\_2021.3.sif

提交代码:

能量最小化(em.lsf)

#BSUB -q 72rtxib

#BSUB -gpu "num=2"

module load singularity/latest

export OMP\_NUM\_THREADS=`echo $LSB_HOSTS | awk '{print NF}'`

SINGULARITY="singularity run --nv /fs00/software/singularity-images/ngc\_gromacs\_2021.3.sif"

${SINGULARITY} gmx grompp -f minim.mdp -c 1aki\_solv\_ions.gro -p topol.top -o em.tpr

${SINGULARITY} gmx mdrun -nb gpu -ntmpi 2 -deffnm em

平衡模拟(nvt)

#BSUB -q 722080tiib

#BSUB -gpu "num=2"

module load singularity/latest

export OMP\_NUM\_THREADS=`echo $LSB_HOSTS | awk '{print NF}'`

SINGULARITY="singularity run --nv /fs00/software/singularity-images/ngc\_gromacs\_2021.3.sif"

${SINGULARITY} gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr

${SINGULARITY} gmx mdrun -nb gpu -ntmpi 2 -deffnm nvt

平衡模拟(npt)

#BSUB -q 722080tiib

#BSUB -gpu "num=2"

module load singularity/latest

export OMP\_NUM\_THREADS=`echo $LSB_HOSTS | awk '{print NF}'`

SINGULARITY="singularity run --nv /fs00/software/singularity-images/ngc\_gromacs\_2021.3.sif"

${SINGULARITY} gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr

${SINGULARITY} gmx mdrun -nb gpu -ntmpi 2 -deffnm npt

成品模拟(md)

#BSUB -q 723090ib

#BSUB -gpu "num=4"

module load singularity/latest

export OMP\_NUM\_THREADS=`echo $LSB_HOSTS | awk '{print NF}'`

SINGULARITY="singularity run --nv /fs00/software/singularity-images/ngc\_gromacs\_2021.3.sif"

${SINGULARITY} gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md\_0\_1.tpr

${SINGULARITY} gmx mdrun -nb gpu -bonded gpu -update gpu -pme gpu -pmefft gpu -deffnm md\_0\_1

软件信息:

GROMACS version: 2021.3-dev-20210818-11266ae-dirty-unknown

Precision: mixed

Memory model: 64 bit

MPI library: thread\_mpi

OpenMP support: enabled (GMX\_OPENMP\_MAX\_THREADS = 64)

GPU support: CUDA

SIMD instructions: AVX2\_256

FFT library: fftw-3.3.9-sse2-avx-avx2-avx2\_128-avx512

CUDA driver: 11.20

CUDA runtime: 11.40

测试算例:

ATOM 218234 (401 Protein residues, 68414 SOL, 9 Ion residues)

nsteps = 100000000 ; 200 ns

eScience中心GPU测试: 能量最小化(em)、平衡模拟(nvt、npt)使用两个GPU进行模拟,成品模拟(md)使用四个GPU进行模拟。

任务1emnvtnptmd
72rtxib722080tiib722080tiib723090ib
CPU time1168.4513960.3342378.71117.428 ns/day0.204 hour/ns
Run time7916485586
Turnaround time19717325661
任务2em** nvt **** npt **** md**
72rtxib722080tiib72rtxib722080tiib
CPU time1399.3015732.6640568.04106.862 ns/day0.225 hour/ns
Run time9319055236
Turnaround time18119915479
任务** 3 **** em **** nvt **** npt **** md**
72rtxib72rtxib72rtxib72rtxib
CPU time1368.115422.495613.74103.213 ns/day0.233 hour/ns
Run time92355366
Turnaround time180451451
任务** 4 **** em **** nvt **** npt **** md**
72rtxib72rtxib72rtxib722080tiib
CPU time1321.155441.605618.87111.807 ns/day0.215 hour/ns
Run time89356369
Turnaround time266440435
任务** 5 **** em **** nvt **** npt **** md**
72rtxib72rtxib72rtxib72rtxib
CPU time1044.175422.945768.44110.534 ns/day0.217 hour/ns
Run time72354380
Turnaround time162440431
任务** 6 **** em **** nvt **** npt **** md**
723090ib723090ib723090ib723090ib
CPU time1569.177133.746677.25114.362 ns/day0.210 hour/ns
Run time81326325
Turnaround time75320300
任务** 7 **** em **** nvt **** npt **** md**
723090ib723090ib723090ib722080tiib
CPU time1970.565665.716841.73111.409 ns/day0.215 hour/ns
Run time91253327
Turnaround time123251328
任务** 8 **** em **** nvt **** npt **** md**
72rtxib72rtxib72rtxib72rtxib
CPU time1234.245540.595528.91114.570 ns/day0.209 hour/ns
Run time108363370
Turnaround time85364363
任务** 9 **** em **** nvt **** npt **** md**
723090ib723090ib723090ib723090ib
CPU time2016.107633.837983.58115.695 ns/day0.207 hour/ns
Run time93342361
Turnaround time130377356
任务** 10 **** em **** nvt **** npt **** md**
723090ib723090ib723090ib72rtxib
CPU time1483.847025.657034.90102.324 ns/day0.235 hour/ns
Run time68317333
Turnaround time70319316

结论:

  1. 能量最小化(em)在任务较少的722080tiib和72rtxib队列中,Run time分别为88.83 ± 12.45和83.25 ± 11.44 s;

  2. 平衡模拟(nvt)任务在722080tiib、72rtxib和723090ib队列中,Run time分别为1776.50 ± 181.73、357.00 ± 4.08和309.50 ± 39.06 s;

  3. 平衡模拟(npt)任务在722080tiib、72rtxib和723090ib队列中,Run time分别为5411.00 ± 247.49、371.25 ± 6.08和336.50 ± 16.68 s;

  4. 成品模拟(md)任务在722080tiib、72rtxib、和723090ib队列中,性能表现差别不大,分别为110.03 ± 55.06、115.83 ± 57.93和107.66 ± 5.90 ns/day,723090ib队列性能表现更为稳定。

  5. 综上,建议在能量最小化(em)、平衡模拟(nvt、npt)等阶段 使用排队任务较少的** 72rtxib **队列 ,建议在成品模拟(md)阶段按照任务数量(从笔者使用情况来看,排队任务数量72rtxib<722080tiib<723090ib<83a100ib)、GPU收费情况(校内及协同创新中心用户:72rtxib队列1.8 元/卡/小时=0.45元/核/小时、722080tiib队列1.2 元/卡/小时=0.3元/核/小时、723090ib队列1.8 元/卡/小时=0.3元/核/小时、83a100ib队列4.8 元/卡/小时=0.3元/核/小时)适当考虑队列。