China has overtaking the US in the total number of ranked supercomputer systems in the top 500 by a margin of 202 to 143. It is the largest number of supercomputers China has ever claimed on the TOP500 ranking, with the US presence shrinking to its lowest level since the list’s inception 25 years ago.
Just six months ago, the US led with 169 systems, with China coming in at 160. Despite the reversal of fortunes, the 144 systems claimed by the US gives them a solid second place finish, with Japan in third place with 35, followed by Germany with 20, France with 18, and the UK with 15.
China has also overtaken the US in aggregate performance as well. The Asian superpower now claims 35.4 percent of the TOP500 flops, with the US in second place with 29.6 percent.
The top 10 systems remain largely unchanged since the June 2017 list, with a couple of notable exceptions.
Sunway TaihuLight, a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC), and installed at the National Supercomputing Center in Wuxi, maintains its number one ranking for the fourth time, with a High Performance Linpack (HPL) mark of 93.01 petaflops.
Tianhe-2 (Milky Way-2), a system developed by China’s National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzho, China, is still the number two system at 33.86 petaflops.
The original plan was to upgrade the system with the newer Knights Landing devices. But after the US government instituted an embargo on these chips to certain Chinese supercomputing sites, including the Guangzhou center, the National University of Defense Technology (NUDT) had to come up with plan B. In this case, that meant developing their own coprocessor. That turned out be the Matrix-2000, a DSP-type chip, tweaked for more general-purpose computation.
According to slides presented at the forum, each Matrix-2000 will deliver 2.4576 teraflops (peak), which more than doubles the 1.0 teraflops delivered by the original Xeon Phi chip. The Matrix-2000 consists of 128 cores, each one providing 16 double precision flops per cycle. Those flops are delivered by a 256-bit vector unit, which as Satoshi notes, is in line with the Knights Corner chip it replaces.
Other enhancements include an interconnect that is 40 percent faster interconnect (to 14 Gbps) and has 50 percent lower latency (1 us). This is likely the TH-Express-2+ that NUDT has talked about before. In addition, main memory has been bumped from 1.4 to 3.4 petabytes, slightly improving the bytes-to-flops ratio of the Tianhe-2. Storage has also been enhanced in both capacity and I/O bandwidth.
The US should have its 200 petaflop supercomputer working in 2018.
The US is currently targeting 2023 for an exaflop supercomputer. China is working towards 2019-2020 for an exaflop supercomputer.
Piz Daint, a Cray XC50 system installed at the Swiss National Supercomputing Centre (CSCS) in Lugano, Switzerland, maintains its number three position with 19.59 petaflops, reaffirming its status as the most powerful supercomputer in Europe. Piz Daint was upgraded last year with NVIDIA Tesla P100 GPUs, which more than doubled its HPL performance of 9.77 petaflops.
The new number four system is the upgraded Gyoukou supercomputer, a ZettaScaler-2.2 system deployed at Japan’s Agency for Marine-Earth Science and Technology, which was the home of the Earth Simulator. Gyoukou was able to achieve an HPL result of 19.14 petaflops. using PEZY-SC2 accelerators, along with conventional Intel Xeon processors. The system’s 19,860,000 cores represent the highest level of concurrency ever recorded on the TOP500 rankings of supercomputers.
Titan, a five-year-old Cray XK7 system installed at the Department of Energy’s (DOE) Oak Ridge National Laboratory, and still the largest system in the US, slips down to number five. Its 17.59 petaflops are mainly the result of its NVIDIA K20x GPU accelerators.
Sequoia, an IBM BlueGene/Q system installed at DOE’s Lawrence Livermore National Laboratory, is the number six system on the list with a mark of 17.17 petaflops. It was deployed in 2011.
The new number seven system is Trinity, a Cray XC40 supercomputer operated by Los Alamos National Laboratory and Sandia National Laboratories. It was recently upgraded with Intel “Knights Landing” Xeon Phi processors, which propelled it from 8.10 petaflops six months ago to its current high-water mark of 14.14 petaflops.
Cori, a Cray XC40 supercomputer, installed at the National Energy Research Scientific Computing Center (NERSC), is now the eighth fastest supercomputer in the world. Its 1,630 Intel Xeon “Haswell” processor nodes and 9,300 Intel Xeon Phi 7250 nodes yielded an HPL result of 14.01 petaflops.
At 13.55 petaflops, Oakforest-PACS, a Fujitsu PRIMERGY CX1640 M1 installed at Joint Center for Advanced High Performance Computing in Japan, is the number nine system. It too is powered by Intel “Knights Landing” Xeon Phi processors.
Fujitsu’s K computer installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan, is now the number 10 system at 10.51 petaflops. Its performance is derived from its 88 thousand SPARC64 processor cores linked by Fujitsu’s Tofu interconnect. Despite its tenth-place showing on HPL, the K Computer is the top-ranked system on the High-Performance Conjugate Gradients (HPCG) benchmark.
For the first time, each of the top 10 supercomputers delivered more than 10 petaflops on HPL. There are also 181 systems with performance greater than a petaflop – up from 138 on the June 2017 list. Taking a broader look, the combined performance of all 500 systems has grown to 845 petaflops, compared to 749 petaflops six months ago and 672 petaflops one year ago. Even though aggregate performance grew by nearly 100 petaflops, the relative increase is well below the list’s long-term historical trend.
A further reflection of this slowdown is the list turnover. The entry point in the latest rankings moved up to 548 teraflops, compared to 432 teraflops in June. The 548-teraflop system was in position 370 in the previous TOP500 list.
The TOP500 list is now incorporating the High-Performance Conjugate Gradient (HPCG) benchmark results into the list to provide a more balanced look at system performance. The benchmark incorporates calculations in sparse matrix multiplication, global collectives, and vector updates, which more closely represents the mix of computational and data access patterns used in many supercomputing codes.
As previously mentioned, the fastest system using the HPCG benchmark remains Fujitsu’s K computer, which is ranked number 10 in the overall TOP500 rankings. It achieved 602.7 teraflops on HPCG, followed closely by Tianhe-2 with a score of 580.0 teraflops. The upgraded Trinity supercomputer comes in at number three at 546.1 teraflops, followed by Piz Daint at number four with 486.4 teraflops, and Sunway TaihuLight at number five at 480.8 teraflops.