World number two supercomputer will have doubled power to 95 petaflops

China is upgrading its number two supercomputer with new Chinese-made Matrix-2000 GPDSP accelerators. They will replace the existing Intel Knights Corner Xeon Phi coprocessors that were installed in the Tianhe-2 back in 2013. The upgraded supercomputer will be called the Tianhe-2A.

The original plan was to upgrade the system with the newer Knights Landing devices. But after the US government instituted an embargo on these chips to certain Chinese supercomputing sites, including the Guangzhou center, the National University of Defense Technology (NUDT) had to come up with plan B. In this case, that meant developing their own coprocessor. That turned out be the Matrix-2000, a DSP-type chip, tweaked for more general-purpose computation.

According to slides presented at the forum, each Matrix-2000 will deliver 2.4576 teraflops (peak), which more than doubles the 1.0 teraflops delivered by the original Xeon Phi chip. The Matrix-2000 consists of 128 cores, each one providing 16 double precision flops per cycle. Those flops are delivered by a 256-bit vector unit, which as Satoshi notes, is in line with the Knights Corner chip it replaces.

At least for the time being, the system will retain the original host CPUs from Tianhe-2, which are Intel Xeon processors. Each supercomputer node will pair two of those Intel CPUs with two Matrix-2000 coprocessors, hooked in via PCIe. The node count is being increased from 16,000 to 17,792.

Other enhancements include an interconnect that is 40 percent faster interconnect (to 14 Gbps) and has 50 percent lower latency (1 us). This is likely the TH-Express-2+ that NUDT has talked about before. In addition, main memory has been bumped from 1.4 to 3.4 petabytes, slightly improving the bytes-to-flops ratio of the Tianhe-2. Storage has also been enhanced in both capacity and I/O bandwidth.

Even though peak performance is going to nearly double, the system’s total power draw of 18 MW is just slightly more than that of the original system. That gives it a power efficiency of more than 5 gigaflops per watt, which would place it somewhere around the number 20 slot on the Green500 list.

Ironically, the upgrade won’t improve the system’s position in the TOP500 rankings. The number one Sunway TaihuLight has a peak performance of 125.4 petaflops, and attains 93 petaflops on the High Performance Linpack (HPL) benchmark. It’s unlikely Tianhe-2A will come in at better than 70 or 80 petaflops on HPL.