Lawrence Berkeley National Lab researchers are looking at is configurable processor technology developed by Tensilica Inc. The company offers a set of tools that system developers can employ to design both the SoC and the processor cores themselves. A real-world implementation of this technology. LBNL estimate that a 10 petaflop peak system built with Tensilica technology would only draw 3 megawatts and cost just $75 million. It’s not a general-purpose system, but neither is it a one-off machine for a single application (like Japan’s MD-GRAPE machine, for example). A 10 petaflop Opteron-based system was estimated to cost $1.8 billion and require 179 megawatts to operate; the corresponding Blue Gene/L system would cost $2.6 billion and draw 27 megawatts. Extrapolating the half petaflop Barcelona-based Ranger supercomputer to 10 petaflops, it would require about 50 megawatts and cost $600 million (although it’s widely assumed that Sun discounted the Ranger price significantly). A 10 petaflop Blue Gene/P system would draw 20 megawatts, with perhaps a similar cost as the Blue Gene/L.
– AMD Opteron: Commodity Approach – Lower efficiency for scientific applications
offset by cost efficiencies of mass market
• Popular building block for HPC, from commodity to tightly-coupled XT3.
• Our AMD pricing is based on servers only without interconnect
– BlueGene/L: Use generic embedded processor core and customize System on Chip
(SoC) services around it to improve power efficiency for scientific applications
• Power efficient approach, with high concurrency implementation
• BG/L SOC includes logic for interconnect network
– Tensilica: In addition to customizing the SOC, also customizes the CPU core for
further power efficiency benefits but maintains programmability
• Design includes custom chip, fabrication, raw hardware, and interconnect
10 petaflops of sustained performance would cost 10-20 times more, which would be available for the same price in 5 years with Moore’s Law.
So by 2012-2013, a 100-200 petaflop peak performance supercomputer based on configurable processors would be $75 million and an exaflop supercomputer would be in the $375-750 million range in 2012-2013.
10 petaflop supercomputer by 2012-2013
Petaflop personal computers and wearable computing 2016-2018
Personal petaflop machines seem likely to come about from better GPGPUs, FPGAs and mainstreaming several configurable components.
Another breakthrough is for four times as much memory in cheaper servers. More memory is needed for high performance applications
MetaSDRAM is a drop-in solution that closes the gap between processor computing power, which doubles every 18 months — and DRAM capacity, which doubles only every 36 months. Until now, the industry addressed this gap by adding higher capacity, but not readily available, and exponentially more expensive DRAM to each dual in-line memory module (DIMM) on the motherboard.
The MetaSDRAM chipset, which sits between the memory controller and the DRAM, solves the memory capacity problem cost effectively by enabling up to four times more mainstream DRAMs to be integrated into existing DIMMs without the need for any hardware or software changes. The chipset makes multiple DRAMs look like a larger capacity DRAM to the memory controller. The result is “stealth” high-capacity memory that circumvents the normal limitations set by the memory controller. This new technology has accelerated memory technology development by 2-4 years.
Research paper on the IBM Kittyhawk project to build a global scale computer IBM wants to use supercomputers to handle many kinds of large scale applications more efficiently than with clusters of boxes.
A glimpse of how this might take shape was revealed in a recent IBM Research paper that described using the Blue Gene/P supercomputer as a hardware platform for the Internet. The authors of the paper point to Blue Gene’s exceptional compute density, highly efficient use of power, and superior performance per dollar. Regarding the drawbacks of the current infrastructure of the Internet, the authors write:
At present, almost all of the companies operating at web-scale are using clusters of commodity computers, an approach that we postulate is akin to building a power plant from a collection of portable generators. That is, commodity computers were never designed to be efficient at scale, so while each server seems like a low-price part in isolation, the cluster in aggregate is expensive to purchase, power and cool in addition to being failure-prone.
The IBM’ers are certainly talking about a more general-purpose petascale application than the Berkeley researchers, but one aspect is the same: ditch the loosely coupled, commodity-based systems in favor of a tightly coupled, customized architecture that focuses on low power and high throughput. If this is truly the model that emerges for ultra-scale computing, then the whole industry is in for a wild ride.