Multi-Petaflop Supercomputers Now to 2011

Fujitsu 10 Petaflops by early 2011

Fujitsu is building the supercomputer for Japan’s Institute of Physical and Chemical Research, known as RIKEN, said Takumi Maruyama, head of Fujitsu’s processor development department, on the sidelines of the Hot Chips conference at Stanford University on Tuesday.

The system will be based on Fujitsu’s upcoming Sparc64 VIIIfx processor, which has eight processor cores and will be an update to the four-core Sparc64 VII chip that Fujitsu released two years ago, Maruyama said.

A prior nextbigfuture article on the Fujitsu supercomputer

Blue Waters Up to 10 petaflops in 2011
Blue Waters is the name of a petascale supercomputer being designed and built as a joint effort between the National Center for Supercomputing Applications, the University of Illinois at Urbana-Champaign, and IBM. Expected to be completed in 2011, Blue Waters is expected to run science and engineering codes at sustained speeds of at least one petaflops, or one quadrillion floating point operations per second. This is nearly four times faster than IBM’s Blue Gene L. One source has stated that Blue Waters may hit a peak system speed of 10 petaflops.

the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications (NCSA) announced today that they have finalized their contract with IBM to build the world’s first sustained petascale computational system dedicated to open scientific research. This leadership-class project, called Blue Waters, is supported by a $208 million grant from the National Science Foundation and will come online in 2011.

Upgraded Jaguar

New six-core “Istanbul” processors from AMD are being installed now and will rev up Jaguar’s peak processing capability to “well over 2 petaflops.” by the end of 2009. That’s more than 2,000 trillion mathematical calculations per second.

The switch from quadcore processing to six-core processors will results in about a 70 percent performance gain and also enhance the memory applications. So the 1.6 petaflop peak processing of Jaguar should go to 2.7 petaflops peak.

AMD believes that a proposed 12-core processor, code-named Magny-Cours, in 2010 will give them a major advantage. And that may be why Advanced Micro Devices Inc. said this week that it is releasing its six-core Opteron chip in June, well ahead of schedule, and plans to follow it early next year with a chip code-named Magny-Cours that will ship in eight- and 12-core models. After that, it plans a 16-core chip in 2011. If there was a 2010 chip refresh on the Jaguar supercomputer then it could go 5 petaflops of peak performance and a 2011 chip refresh could provide 7 petaflops.

Sequoia, IBM and DOE making 20 petaflop for 2011

IBM has promised the DOE National Nuclear Security Administration a 20 petaflop supercomputer that is scheduled for delivery in 2011. The supercomputer will be called Sequoia. The computer should be ten times more energy efficient per calculation than current supercomputers.

The Sequoia effort includes two generations of IBM Blue Gene supercomputers that will deliver the next generation of advanced systems to weapon simulation codes being developed under the ASC program. ASC is a cornerstone of the National Nuclear Security Administration’s (NNSA) program to ensure the safety, security and reliability of the nation’s nuclear deterrent without underground testing — Stockpile Stewardship. These two Blue Gene systems are “Dawn,” a 500-teraflop system that was accepted by LLNL in March of 2009, and “Sequoia,” a 20-petaflop system based on future Blue Gene technology, slated for delivery in 2011. Lawrence Livermore Selects TotalView Debugger for the 20 Petaflop System.

Among the features that TotalView Technologies will incorporate for the Dawn and Sequoia systems are user-programmable data display, fast conditional breakpoints and watchpoints, compiled expressions, asynchronous thread control, and full post-mortem debugging.

At 20 petaflops, Sequoia will be 34 times as powerful as LLNL’s current Blue Gene/L, giving scientists a lot more computing cycles for weapons simulations and basic science research. “Sequoia represents a major challenge to code developers as the multi-core era demands that we effectively absorb more cores and threads per MPI task,” said Mark Seager, Asst. Dept. Head for Advanced Computing Technology at LLNL. “This programming challenge can only be overcome with world class code development tools. Through our long-term partnership with cutting-edge technology companies like TotalView Technologies we are confident we can deliver on our demanding debugger scalability and usability requirements.”

TotalView is a comprehensive source code analysis and memory error detection tool that dramatically enhances developer productivity by simplifying the process of debugging parallel, data-intensive, multi-process, multi-threaded or network-distributed applications. Built to handle the complexities of the world’s most demanding applications, TotalView offers a number of advanced features that help speed development and eliminate bugs quickly, and is capable of scaling to thousands of processes or threads with applications distributed over multiple machines or processors.

Cloud Computing and Distributed Supercomputing

Folding@home the largest distributed computing effort has 8.3 petaflops of computing power in active utilization.

This effort represents one of the larger current “World Computers” -400,000 active for folding@home, 30,000 GPGPUs provide 68% of the processing and 36,600 Playstations 3 represent 25% of the rest of the processing power. If participation continues to grow and if by 2011 almost all of about 1 million active participating computers had 1 teraflops of performance then the combined power would be 1 exaflop.

Sun Microsystems claimed a new watermark for server CPUs, unveiling Rainbow Falls, a 16-core, 128-thread processor at the Hot Chips conference Tuesday (August 25). But analysts gave the IBM Power7 kudos as the more compelling achievement in the latest round of high-end server processors. Power7 packs as many as 32 cores supporting 128 threads on a four-chip module with links to handle up to 32 sockets in a system. “It is scaling well beyond anything we’ve ever really seen before,” said Peter Glaskowsky, a technology analyst for Envisioneering Group