GROMACS Performance on Intel Instances on AWS

Why Intel for HPC in the Cloud

  • Technology partnerships with leading ingredient providers to ensure optimization to Intel® CPUs.

  • Deep ISV and HPC community collaborations focused on optimization for leading HPC codes.

  • Scalability and flexibility for varying workloads in the cloud environment.

author-image

作者

Intel Instances for HPC Workloads

The tests below were conducted on AWS instances that are based on various generations of Intel® Xeon® processor in a hyper-threaded configuration. This custom processor can reach an all-core Turbo clock speed of up to 3.5GHz and features Intel® Turbo Boost Technology 2.0, Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and Intel® Deep Learning Boost. These new offerings deliver a better value proposition for general-purpose and memory-intensive workloads compared to the prior generation (e.g., increased scalability and an upgraded CPU class), including better performance.

What Is GROMACS?

The GROMACS application is a compute-bound application (FLOPS). The workloads in this application are latency sensitive for any communication (socket-socket, CPU-GPU and multi-node). It does take benefits from AVX-512 (Y), compute bound (Y) (excluding ionchannel – it is MPI bound on 8-16 nodes), benefits from Turbo (Y), benefits from HT/SMT (Y).

The workloads that we have considered for our benchmarking are publicly available:

  • lignocellulose (3M atoms, RF type); Lignocellulose is useful as an example of scalability demonstration.
  • water_rf (1,5M atoms, RF type)

See Below for workloads and configurations. Results may vary.

Configuration of C6i.32xlarge – 3rd Gen Intel® Xeon® Scalable Processor @ 2.9GHz, 256GB Memory Capacity, Network band- width 50 Gbps, CentOS Linux 7 release kernel 3.10.0-1160.45.1.el7.x86_64, GROMACS version 2021.3, icc 2021.4.0 20210910, Intel® MPI Library for Linux OS, Version 2021.4 Build 20210831 (id: 758087adf), Tested by Intel by 11/09/2021

Configuration of C5n.18xlarge – Intel® Xeon® Scalable Processor @ 2.9GHz, 192GB Memory Capacity, Network bandwidth 100 Gbps, CentOS Linux 7 release kernel 3.10.0-1160.45.1.el7.x86_64, GROMACS version 2021.3, icc 2021.4.0 20210910, Intel® MPI Library for Linux OS, Version 2021.4 Build 20210831 (id: 758087adf), Tested by Intel by 11/09/2021

Configuration of M6i.32xlarge – 3rd Gen Intel® Xeon® Scalable Processor @ 2.9GHz, 512GB Memory Capacity, Network band- width 50 Gbps, CentOS Linux 7 release kernel 3.10.0-1160.45.1.el7.x86_64, GRO-MACS version 2021.3, icc 2021.4.0 20210910, Intel® MPI Library for Linux OS, Version 2021.4 Build 20210831 (id: 758087adf), Tested by Intel by 11/09/2021

See Below for workloads and configurations. Results may vary.

Configuration of C6i.32xlarge – 3rd Gen Intel® Xeon® Scalable Processor @ 2.9GHz, 256GB Memory Capacity, Network band- width 50 Gbps, CentOS Linux 7 release kernel 3.10.0-1160.45.1.el7.x86_64, GROMACS version 2021.3, icc 2021.4.0 20210910, Intel® MPI Library for Linux OS, Version 2021.4 Build 20210831 (id: 758087adf), Tested by Intel by 11/09/2021

Configuration of C5n.18xlarge – Intel® Xeon® Scalable Processor @ 2.9GHz, 192GB Memory Capacity, Network bandwidth 100 Gbps, CentOS Linux 7 release kernel 3.10.0-1160.45.1.el7.x86_64, GROMACS version 2021.3, icc 2021.4.0 20210910, Intel® MPI Library for Linux OS, Version 2021.4 Build 20210831 (id: 758087adf), Tested by Intel by 11/09/2021

Configuration of M6i.32xlarge – 3rd Gen Intel® Xeon® Scalable Processor @ 2.9GHz, 512GB Memory Capacity, Network band- width 50 Gbps, CentOS Linux 7 release kernel 3.10.0-1160.45.1.el7.x86_64, GRO-MACS version 2021.3, icc 2021.4.0 20210910, Intel® MPI Library for Linux OS, Version 2021.4 Build 20210831 (id: 758087adf), Tested by Intel by 11/09/2021

How to Get Intel Benefits

3rd Gen Intel Xeon Scalable processors provide significant performance gains for the GROMACS workload that are accelerated by the Intel AVX-512 and Intel Deep Learning Boost technologies. This acceleration provides significant benefits at lower node count (greater than 2x). It becomes more limited as we scale into larger node count due to lower network bandwidth of C6i.32xlarge and M6i.32xlarge. Customers running this GROMACS workload can realize significant performance gains by deploying on 3rd Gen Intel Xeon Scalable instance types at AWS (M6i, C6i) vs. running on previous generation Intel Xeon Scalable processors at AWS.
Resources: www.intel.com.tw/HPC