KISTI: Pushing Science and Technology Boundaries

With Intel® Xeon® Scalable processors, NURION is the largest supercomputer in South Korea.

Executive Summary
No longer strictly focused on computationally intensive workloads, modern HPC centers need performant yet general-purpose systems that can address the many challenging and conflicting resource demands required to achieve scientific breakthroughs across a wide array of increasingly complex memory and data-intensive research projects. Further, world-class supercomputers such as the Korea Institute of Science and Technology Information (KISTI) NURION system are also flagship technology tools procured by an organization to provide for the future—be it in science or to meet the economic needs of a region.

According to Dr. Hee-yoon Choi (KISTI president), “KISTI will grow with the industry, academy, and institute community as a central organization to support the dynamic science and technology data ecosystem which, shares data and creates value, laying a foundation for Korea’s innovation growth”1. Equipped with Intel® Xeon® Scalable and Intel® Xeon Phi™ processors linked via an Intel® Omni-Path Architecture (Intel® OPA) communications fabric, the NURION 146-rack Cray* CS500 cluster was procured to expand and increase the pace of innovative R&D. It is the largest supercomputer in South Korea and currently the 13th fastest supercomputer in the world2.

Challenge
Scalability and the need to solve large-scale PDE problems which, involve sparse matrix operations were key technology motivators in the KISTI procurement of a powerful new leadership class supercomputer. Very simply, researchers had outgrown and needed to move beyond the existing decade old TACHYON-II cluster.

Materials research is one of the application areas that KISTI has focused on as a leading HPC R&D institute, since it has the strong potential to lead advanced semiconductor device design that is important for national competitiveness of South Korea. In particular, KISTI has pursued the ability to simulate large-scale solid atomic structures with HPCs.

Dr. Soonwook Hwang (General Director and Principal Researcher, Division of National Supercomputing at KISTI) explains, “Electronic structure simulation of realistically sized solid structures is quite critical to help experimentalists who work on designs of new materials or advanced electronic devices. With large-scale simulations, we expect to cover design factors for nanoscale devices with large-scale simulations that can predict physical behaviors of solid structures having up to several million atoms.”

Approach
Efficiently utilizing the large amount of many- and multi-core processors at scale as well as chip-level vector parallelism requires both detailed scientific and engineering knowledge. While KISTI has been firmly keeping the leadership of HPC R&D in South Korea during last decade with Tachyon-II cluster, the new NURION introduced new levels of technology. Dr. Hwang explains, “Our Intel® Parallel Computing Center (Intel® PCC) project has served as a great opportunity for us to better understand and utilize the many- and multi-core Intel® processors. With the NURION system, now we are ready to broaden the leadership of HPC R&D in the Republic of Korea.”

Results
The Intel PCC collaborative effort has paid off with quick returns as KISTI researchers have already achieved significant success even though NURION was just recently installed and is just starting to be made available to public users.

The Intel PCC project has focused on developing a software package for tight-binding simulations of large-scale electronic structures. Dr. Hoon Ryu (Intel PCC Lead and Principal Researcher, Center for Applied Scientific Computing at KISTI) notes, “The code is useful for advanced semiconducting devices, which is a key national business of South Korea.” KISTI was the first Intel PCC in the Asia-Pacific area starting in 2013.

Dr. Ryu continues, “This work basically needs to solve a Schrödinger equation that normally involves nanostructures consisting of tens of millions of atoms, which are numerically described with system matrices of a billion degrees of freedom. As a result, scalable processors are definitely needed with parallelization of core numerical operations including eigenvalue problems involving large-scale system matrices. With Intel Xeon Phi processors, we are able to drive a huge reduction of end-to-end simulation times for millions of atomic systems.”

Nurion Supercomputer Highlights

  • The 13th fastest supercomputer in the world as of the November 2018 TOP500 list2
  • Equipped with both Intel Xeon Scalable processors and Intel Xeon Phi processors and utilizing Intel Omni-Path Architecture, it is the largest supercomputer in South Korea
  • Designed to provide the resources to achieve scientific breakthroughs for a wide array of increasingly complex, data-intensive challenges across modeling, simulation, analytics, and AI

Use Case: Scaling to 1000k+ Atoms
Dr. Min Sun Yeom (director and principal researcher, Center for Applied Scientific Computing at KISTI) says, “With tight-binding simulations of nanostructures having > 1,000,000 atoms on NURION system, we were able to explore the effect of size and structural engineering on band gap energies of physically realizable lead halide perovskite nanostructures within quite reasonable times. We also obtained the preliminary ideas for how to reduce the light-induced phase separation in halide mixtures, which would not be possible with DFT simulations that can normally handle solids consisting of hundreds of atoms.”

Metal halide perovskite is a promising material candidate for optoelectronic devices, and thus provides the motivation for system empirical modelling of large-scale atomic structures. In short, it can provide nice guidelines for device designs such as how to map optical gaps and how to alleviate light-induced phase separation (a bottleneck in LED designs). The best part of empirical modelling is that it can provide direct connections to experiments.

Connection of experiments and large-scale simulations (a) Experimental image of perovskite (CsPbBr3) quantum dots (Nano Letters 15, 3692-3696) (b) Dependency of band gap energies on quantum dot sizes. The KISTI numerical results connect nicely to experiment.

Dr. Ryu points out that the use of Intel® Math Kernel Library (Intel® MKL) helped scale their calculations, “Intel MKL (scalapack packages such as lib_mkl_scalapack_lp64 and libmkl_blacs_intelmpi_lp64) helped a lot to improve the scalability of our Schrödinger solver. We used the LANCZOS algorithm, a well-known iterative method to tackle large-scale eigenvalue problem which, has a numerical part that is hard to be MPI-parallelized by users and becomes a performance bottleneck as iterative processes continue. With the Intel MKL subroutines, we were able to reduce the corresponding computing load with improved scalability.”

Use Case: Many-core Performance on Sparse Matrix Operations
Leveraging previous work on the first generation Intel Xeon Phi coprocessors, Mr. Kyu Nam Cho (former research associate, Korea University, now principal engineer in Samsung Research, Samsung Electronics) says, “The performance of sparse matrix-vector multiplication, which is the core numerical operation needed to solve large-scale electronic structures, was not bad even when we worked with Intel first generation many-core processors (Intel Xeon Phi coprocessors) compared to Intel® Xeon® processors V3. The performance on the NURION Intel Xeon Phi nodes is much better, particularly when combined with MCDRAM.” Cho notes that, “Another critical strength of Intel Xeon Phi processor-based systems is their ease of use, particularly if we consider the amount of work that must be performed to port the existing code to run on PCI-E add-in devices.”

The KISTI Intel PCC found that the speedup due to the performance of the Intel Xeon Phi processor’s high bandwidth memory (HBM) meant that a single node could take a larger workload. Dr. Ryu points out that “inter-node scalability is quite nice.” Scalability tests demonstrate a speedup when increasing the number of computing nodes. The KISTI Intel PCC observed a 1.5-3x speedup3 when they made use of the high bandwidth memory (HBM) packaged with the many-core Intel Xeon Phi processor 7250 nodes. More recently, they successfully ran a 0.4 billion atomic structure in NURION system and checked the strong scalability up to 2,500 computing nodes (170,000 computing cores).

Dr. Ryu points out that “Intel® technology matches with the purpose of KISTI HPC.” According to a statistical workload analysis performed at KISTI, approximately 50% of their workloads involve sparse matrix operations. This means the NURION supercomputer should perform well in meeting the needs of KISTI researchers across a wide range of research areas.

Performance Realized
The importance of large-scale simulations for advanced material research to South Korea cannot be underestimated as evidenced by the money spent to procure a world class supercomputer4. For this reason, the KISTI Intel PCC critically evaluated the various hardware solutions upon which the NURION procurement could be based—including GPU accelerated systems. Their results have been published in the literature for Intel processors5 6 7 and GPUs8. They present solid technical evidences to show why the choice for NURION was an Intel based system that delivers 25.7 PFlop/s (Rpeak), 13.9 PFlop/s (Rmax),3 ranking it at #13 on the November 2018 TOP500 list.2 Dr. Ryu is developing a white paper to tell the full CPU vs. GPU story in an article to be published later this year9.

Strong scalability of end-to-end simulations (a) Small-scale BMT target was to calculate 5 lowest conduction band states in 27x33x33 nm3 (~1.5million atoms) SI:P quantum dot10The scalability is tested up to 3 computing nodes (204 cores). (b) Extremely large-scale BMT target was to calculate 3 lowest conduction subbands in 2715x54x54 nm3 Si:P nanowires (0.4billion atoms). The scalability here is tested up to 2,560 computing nodes (170,000 cores) in NURION system.

However the story does not stop with the NURION system as the KISTI Intel PCC is evaluating the use of FPGAs for large-scale electronic structure calculations. In particular, the Intel Scalable processor family provides a pathway towards future FPGA acceleration11. As with the GPU and Intel processor evaluations, the KISTI Intel PCC has been publishing their work on FPGAs as well12.

KISTI people who enabled scalable simulations of extremely large electronic structures in NURION system: (From left) Dr. Hoon Ryu, Dr. Ji-Hoon Kang (principal researcher, Center for Applied Scientific Computing), Mr. Taeyoung Hong (NURION operation team lead and senior researcher, Supercomputing Service Center

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Omni-Path Architecture

Intel® Omni-Path Architecture (Intel® OPA) lowers system TCO while providing reliability, high performance, and extreme scalability.

Learn more

Intel® Select Solutions

Deliver a simplified data center infrastructure with workload-optimized configurations for fast and easy deployment.

Learn more

注意事項與免責聲明

Intel® 技術的功能與優勢取決於系統配置,而且可能需要支援的硬體、軟體或服務啟動才能使用。實際效能會依系統組態而異。沒有電腦系統能提供絕對的安全性。詳情請洽詢購入系統的製造商或零售商,或是上網參閱 https://www.intel.com.tw// 效能測試中使用的軟體與工作負載,可能只有針對 Intel® 微處理器進行效能最佳化。包括 SYSmark* 與 MobileMark* 在內的效能測試是使用特定電腦系統、零組件、軟體、作業與功能進行測。這些因素若有任何異動,均可能導致測得結果產生變化。考慮購買時,為了協助您充分評估,您應該參考其他資訊及效能測試,包括該產品結合其它產品使用時的效能表現。如需更完整的資訊,請造訪 https://www.intel.com.tw/benchmarks// 效能結果係根據截至組態中所示日期的測試,可能無法反映所有公開提供的安全性更新。請查看組態公開資料以獲得詳細資訊。沒有產品或元件能提供絕對的安全性。// 所述之成本降低情境,用意是要提供範例,指出搭載特定 Intel® 處理器的產品,在特定情況與配置,可能會如何影響未來各項成本以及提供成本節省。實際情況可能有所差異。對於各項成本,或是成本降低幅度,Intel 不提供任何保證。// Intel 並不控制或稽核本文件提及的第三方效能標竿資料或網站。您應造訪該網站並確認本文件提及的資料是否正確。// 部分測試案例結果係採用 Intel 內部分析或架構模擬或模型進行預估或模擬,僅供參考之用。系統硬體、軟體或配置如有任何差異,都可能會影響實際的效能表現。

產品與效能資訊

1 Intel Xeon Phi 7250 nodes; 68 cores/node using 2 MPI processes + 32 threads per node; Quad / Flat memory mode; 100G network connectivity. 2500 Intel Xeon Phi nodes, a total of 68x2500 computing cores were used for the benchmark test of KISTI’s in-house code. BIOS: S72C610.86B.01.03.0018.C0001.012420182107; Memory: 96GB DDR4-2400 memory + 16GB 7.2GT/s MCDRAM; Networking and Storage: Intel Omni-Path Architecture, 100Gb network connectivity; OS and Kernel details: CentOS Linux Release 7.3, Linux kernel 3.10.0- 514.26.2.el7.x86-64; Application software: Quantum simulation tool for Advanced Nanoscale Devices; Tested by KISTI in November, 2018.
2Currently according to the November 2018 TOP500 list
3Test performed by KISTI in November 2018. Rmax is maximal LINPACK performance achieved; Rpeak is theoretical peak perfor­mance per TOP500.org. Configuration: Intel Xeon Phi 7250 nodes; Up to 272 (68x4) cores/node using 4 MPI processes + 68 threads per node; Quad/Flat memory mode; 10 G network connectivity.
7Ji-Hoon Kang, Oh-Kyoung Kwon, Jinwoo Jeong, Kyunghun Lim, Hoon Ryu: Performance Evaluation of Scientific Applications on Intel Xeon Phi Knights Landing Clusters. HPCS 2018: 338-341.
8GPU results were published in “Fast, energy-efficient electronic structure simulations for multi-million atomic systems with GPU devices” by Hoon Ryu and Oh-Kyoung Kwon in Journal of Compu­tational Electronics (2018) 17:698–706, https://doi.org/10.1007/s10825-018-1138-4.
9Please check Dr. Ryu’s publications list to see the article when it ap­pears: https://www.researchgate.net/profile/Hoon_Ryu3
10Si:P alloy structures have been popularly studied to build Si-based qubit systems. See Nature Nanotechnology 9, 430-435, and Nano Letters 15, 1, 450-456.