Fujitsu K supercomputer hits a record 10.51 petaflops and supercomputers in the United States and China

HPCWire – just three and half years after IBM broke the petaflop barrier with its Roadrunner supercomputer, Fujitsu’s “K computer” has passed the 10 petaflops mark. Fujitsu and RIKEN announced on Tuesday that they have completed the final build-out of the system and achieved 10.51 petaflops on Linpack, reaching a major milestone of Japan’s Next-Generation Supercomputing Project.

The completed K system, housed at RIKEN’s Advanced Institute for Computational Science in Kobe, is powered by more than 88 thousand SPARC64 VIIIfx CPUs. The 8-core SPARC64 VIIIfx chip was purpose-built for HPC, delivering 128 peak gigaflops at 2.0 GHz, while drawing a relatively modest 58 watts. Although each CPU represents a single node, four of the SPARC chips are glued to a single motherboard, 24 of which make up a rack. The whole system is comprised of 864 of these racks.

The peak petaflops for the final system is a whopping 11.28 petaflops, and thanks to the Fujitsu’s 6D Tofu interconnect, the system was able to squeeze better than 93 percent Linpack efficiency from the floating pointing parts — a rather remarkable feat. Total time for the Linpack run: 29 hours and 28 minutes.

Japan’s “K Computer” does 10 quadrillion calculations a second (Photo: RIKEN)

The K is destined for all sorts of big science workloads, including nanotechnology simulations, drug discovery, materials design, climate prediction, industrial design, and cosmology, among others. The multi-petaflops capabilities of the machine should enable some of these application to push the envelope of their respective domains.

The Jaguar supercomputer upgrade at Oak Ridge National Lab (ORNL), which will result in a 10 to 20-petaflop system. That machine, which will be renamed “Titan,” will be outfitted with the next-generation “Kepler” GPUs from NVIDIA, but that work isn’t expected to be completed until late 2012. The first phase of the upgrade, which involves plugging 960 Fermi-class GPUs into the machine, is already in motion, and is expected to be completed this year. But it’s rather unlikely those initial enhancements will yield anything approaching 10 petaflops.

Other leading-edge petascale machines include the two big IBM Blue Gene/Q systems headed for US DOE centers: “Mira”, a 10-petaflop system destined for Argonne National Lab, and Sequoia,” a 20 petaflop machine, which will be installed at Lawrence Livermore. But both of these Blue Genes aren’t expected to be operational until 2012.

The 10-petaflop Dell-built cluster for TACC, named “Stampede.” That machine will be relying on Intel’s Many Integrated Core (MIC) coprocessor to provide most of the flops, and since the first production MIC (“Knights Corner”) won’t be available for at least a year, that system won’t be up and running until late 2012.

China has developed a petaflop supercomputer using domestically developed chips.

The Sunway BlueLight MPP supercomputer doesn’t use microprocessors from Intel or AMD. It uses a chip designed by the Chinese themselves — and it’s not the Chinese microprocessor the supercomputing community was expecting. In other words, the Chinese are developing two microprocessors that could shift not only bragging rights in the worldwide supercomputer game, but the general market for server silicon.

Before the Sunway was uncloaked, Dongarra was expecting China to reveal a computing cluster based on an eight-core chip its engineers were developing under the “Loongson” or “Godson” name. Instead, the Sunway uses a previously unknown chip dubbed the “ShenWei SW-3.” Harnessing 8,700 of these chips, the cluster can, in theory, handle more than 1,000 trillion calculations a second — aka a petaflop.

According to reports from Jinan, the ShenWei microprocessor was designed at a supercomputing institute in China and manufactured in Shanghai, and it uses a new instruction set — not the venerable x86 instruction set used by Intel and AMD. The chip runs at about 1GHz, which is well under the speed of the latest Intel and AMD chips, but a lower clock also means it consumes less power.

In revealing the Sunway — which was apparently installed in Jinan in September — the Chinese also unveiled a list of the country’s top 100 supercomputers. Eighty-five still use Intel and fourteen used AMD, but the plan is to move to the lot towards homegrown hardware. The supercomputing game isn’t a huge part of Intel’s or AMD’s business, but there’s a certain amount of prestige wrapped up in these massive machines, and the same chips can be used in ordinary servers.

“Don’t think of this in terms of supercomputing,” says Dongarra. “There’s a low-end that where these chips can work. You can imagine these chips replacing all the Intel chips in the China.”

If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks