Intel® Omni-Path Architecture Performance Tested for HPC
Intel® OPA Performance
This figure compares latency with 8 byte messages for Intel® OPA relative to Enhanced Data Rate* (EDR) InfiniBand* (IB), as measured with the Ohio State University (OSU) OMB osu_latency benchmark for both Open MPI and Intel® MPI. Intel® OPA latency has been measured up to 11% lower than EDR IB*. This latency includes a switch hop for both Intel OPA and EDR IB*.
This figure compares bandwidth with 1 MB messages for Intel® OPA relative to EDR as measured with the OSU OMB osu_bw benchmark for both Open MPI and Intel® MPI. Both Intel® OPA and EDR are capable of delivering nearly full wire rate of 100 Gbps.
This figure compares 8 byte message rate for Intel® OPA relative to EDR as measured with the OSU OMB osu_mbw_mr benchmark for both Open MPI and Intel® MPI. 32 MPI rank pairs are used in the measurement. Intel® OPA has been measured up to 64% higher than EDR without message coalescing at the MPI level. This is a true hardware message rate test without message coalescing in software.
Natural Order Ring (NOR) Latency
Natural order ring (NOR) latency is measured from the b_eff benchmark in the HPC Challenge* benchmark suite. These measurements demonstrate the ability of the fabric to sustain low latency as the cluster scales an HPC application. Intel® OPA has lower latency at 16 fully subscribed nodes using 32 MPI ranks per node.
Random Order Ring (ROR) Latency
Random order ring (ROR) latency is measured from the b_eff benchmark in the HPC Challenge* benchmark suite. These measurements demonstrate the ability of the fabric to sustain low latency as the cluster scales an HPC application. Intel® OPA has lower latency at 16 fully subscribed nodes using 32 MPI ranks per node.
|Test platform||Intel® Xeon® Processor E5-2697A v4 dual-socket servers (16 cores, 40 MB cache, 2.6 GHz, 9.6 GT/s Intel QuickPath Interconnect, 145 W TDP) with 64 GB DDR4 memory @ 2133 MHz. Intel® Turbo Boost Technology and Intel® Hyper-Thread Technology enabled|
Ohio State Micro Benchmarks* v. 5.0
|Operating system||Red Hat Enterprise Linux* 7.2|
|Intel® MPI||Intel® MPI 5.1.3|
|Open MPI*||Open MPI 1.10.0|
|Intel® OPA hardware and settings||shm:tmi fabrics, I_MPI_TMI_DRECV=1, Intel Corporation Device 24f0 – Series 100 HFI ASIC (B0 silicon). OPA Switch: Series 100 Edge Switch – 48 port (B0 silicon). IOU Non-posted Prefetch disabled in BIOS. Snoop hold-off timer = 9|
|EDR hardware and settings||shm:dapl fabric. -genv I_MPI_DAPL_EAGER_MESSAGE_AGGREGATION off (Intel® MPI Only). Mellanox* EDR ConnectX-4 Single Port Rev 3 MCX455A HCA. Mellanox SB7700 - 36 Port EDR InfiniBand switch. MLNX_OFED_LINUX-3.2-184.108.40.206 (OFED-3.2-2.0.0), Best of default, MXM_TLS=self,rc, and -mca pml yalla tunings|
a. osu_latency 8B message
b. osu_bw 1 MB message
c. osu_mbw_mr, 8 B message (uni-directional), 32 MPI rank pairs. Maximum rank pair communication time used instead of average time, introduced into Ohio State Micro Benchmarks as of v3.9 (2/28/13). EDR using shm:ofa fabric, since this returned better message rates than shm:dapl fabric with I_MPI_DAPL_EAGER_MESSAGE_COALESCING disabled
效能測試中使用的軟體與工作負載可能僅針對 Intel® 微處理器進行最佳化。包括 SYSmark* 與 MobileMark* 在內的效能測試是使用特定電腦系統、零組件、軟體、作業與功能進行量測。這些因素若有任何異動，均可能導致測得結果產生變化。建議您參考其他資訊與效能測試數據，協助您充分評估欲購買產品的性能，包括該產品在搭配其他產品運作時的效能。如需完整的資訊，請參閱 http://www.intel.com.tw/benchmarks。