Developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, Sunway TaihuLight displaces Tianhe-2, an Intel-based Chinese supercomputer that has claimed the No. 1 spot on the past six TOP500 lists.
Sunway TaihuLight, with 10,649,600 computing cores comprising 40,960 nodes, is twice as fast and three times as efficient as Tianhe-2, which posted a LINPACK performance of 33.86 petaflop/s. The peak power consumption under load (running the HPL benchmark) is at 15.37 MW, or 6 Gflops/Watt. This allows the TaihuLight system to grab one of the top spots on the Green500 in terms of the Performance/Power metric. Titan, a Cray XK7 system installed at the Department of Energy’s (DOE) Oak Ridge National Laboratory, is now the No. 3 system. It achieved 17.59 petaflop per second.
Sunway TaihuLight is nearly three times faster than the previous #1 system, the Tianhe-2 supercomputer, which has moved to #2 after ruling the roost for some three years or so TaihuLight is also five times faster than Titan, the 17 Petaflop machine at ORNL, which is still the fastest machine in the USA.
Here are the specs:
Linpack: 93 Petaflops (Rmax)
Peak performance: 125.4 Petflops (Rpeak)
Processor: Sunway SW26010 1.4 GHz processor
Cores per socket: 260
Instruction Set: RISC instruction set developed by Sunway
Interconnect: their TOP500 submission says “Sunway design” but Mellanox supplied the Host Channel Adapter (HCA) and switch chips. Sunway may not call it InfiniBand, but that is exactly what it is. China has political reasons for characterizing the overall system domestic technology.
Cabinets: 40 Water-cooled cabinets, each with 3 Petaflops of peak performance
Power consumption: 15.27 Megawatts
It is an unbalanced, floating-point heavy architecture that has no cache and not a whole of memory per core. It has been compared to Blue Gene L.
There are three Gordon Bell submissions based on the new Sunway TaihuLight system.
These three applications are:
(1) a fully-implicit nonhydrostatic dynamic solver for cloud-resolving atmospheric simulation;
(2) a highly effective global surface wave numerical simulation with ultra-high resolution;
(3) large scale phase-field simulation for coarsening dynamics based on Cahn-Hilliard equation with degenerated mobility.
All these three applications have scaled to around 8 million cores (close to the full system scale). The applications that come with an explicit method (such as wave simulation and phase-field simulation) have achieved a sustained performance of 30 to 40 PFlops. In contrast, the implicit solver achieves a sustained performance of around 1.5 PFlops, with a good convergence rate for large-scale problems. These performance number may be improved before the SC16 Conference in November 2016
Jack Dongarra report on the Sunway.
SOURCES - Top500, HpcWire