US now has first and third most powerful supercomputers

Lawrence Livermore National Laboratory (LLNL) unveiled Sierra the world’s third fastest supercomputer. Sierra has a peak performance of 125 petaFLOPS — 125 quadrillion floating-point operations per second. Early indications using existing codes and benchmark tests are promising, demonstrating as predicted that Sierra can perform most required calculations far more efficiently in terms of cost and power consumption than systems consisting of CPUs alone. Depending on the application, Sierra is expected to be six to 10 times more capable than LLNL’s 20- petaFLOP Sequoia, currently the world’s eighth-fastest supercomputer.

It is the National Nuclear Security Administration (NNSA) first large-scale production heterogeneous system, meaning each node incorporates both IBM central processing units (CPUs) and NVIDIA graphics processing units (GPUs). It is specifically designed for modeling and simulations essential for NNSA’s Stockpile Stewardship Program, ongoing life extension programs, weapons science and nuclear deterrence. It is expected to go into use for classified production in early 2019.

Sierra has 240 computing racks and 4,320 nodes and takes up 7000 square feet. Each node has two IBM POWER 9 CPUs, four NVIDIA V100 GPUs and a Mellanox EDR InfiniBand interconnect. To prepare for this architecture, LLNL has partnered with IBM and NVIDIA to rapidly develop codes and prepare applications to effectively optimize the CPU/GPU nodes.

IBM and NVIDIA personnel worked closely with LLNL, both on-site and remotely, on code development and restructuring to achieve maximum performance, while LLNL personnel provided feedback on system design and the software stack to the vendor. This “center of excellence” co-design strategy is necessary to assure that codes and platforms are well-matched, and applications are optimized for GPU-accelerated architecture. LLNL’s partnership with Oak Ridge National Laboratory, which is siting the Summit system from IBM, also has been extremely helpful throughout the project, from procurement to operation.

LLNL selected the IBM/NVIDIA system due to its energy and cost efficiency, as well as its potential to effectively run NNSA applications. Sierra’s IBM POWER9 processors feature CPU-to-GPU connection via NVIDIA NVLink interconnect, enabling greater memory bandwidth between each node so Sierra can move data throughout the system for maximum performance and efficiency. Backing Sierra is 154 petabytes of IBM Spectrum Scale, a software-defined parallel file system, deployed across 24 racks of Elastic Storage Servers (ESS). To meet the scaling demands of the heterogeneous systems, the solution delivers 1.54 terabytes per second in both read and write bandwidth and can manage 100 billion files per file system.

“The next frontier of supercomputing lies in artificial intelligence,” said John Kelly, senior vice president, Cognitive Solutions and IBM Research. “IBM’s decades-long partnership with LLNL has allowed us to build Sierra from the ground up with the unique design and architecture needed for applying AI to massive data sets. The tremendous insights researchers are seeing will only accelerate high-performance computing for research and business.”

As the first NNSA production supercomputer backed by GPU-accelerated architecture, Sierra’s acquisition required a fundamental shift in how scientists at the three NNSA laboratories program their codes to take advantage of the GPUs. The system’s NVIDIA GPUs also present scientists with an opportunity to investigate the use of machine learning and deep learning to accelerate time-to-solution of physics codes. It is expected that simulation, leveraged by acceleration coming from the use of artificial intelligence technology, will be increasingly employed over the coming decade.

“Sierra is a world-class, pre-exascale supercomputer that allows researchers to run large complex scientific simulations at scale, at speeds never before thought possible,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Equipped with more than 17,000 of our Tesla Tensor Core V100 GPUs, Sierra is a powerful, universal platform for compute-intensive scientific simulations, machine learning, deep learning and visualization applications all in one — paving the path forward for the future of high-performance computing.”

Sierra also leverages Mellanox EDR 100 Gigabit InfiniBand In-Network Computing acceleration engines to achieve higher applications performance and scalability.