Throughput processors have hundreds of cores today and will have thousands of cores by 2015. Performance scaling of single-thread processors stopped in 2002, Dally said, following a period when the industry derived a performance increase of 52 percent per year for more than 20 years. But throughput-optimized processors like graphics processing units (GPUs) are still improving by greater than 70 percent per year.
Interconnect is a dominant factor in power consumption, according to Dally. The emergence of optical interconnect technology may play a role, “but don’t hold your breath,” Dally said, citing technical issues.
Delivering a keynote address here at the Design Automation Conference Wednesday (July 29), William J. Dally, chief scientist and senior vice president of research at Nvidia and an engineering professor at Stanford University, said computing is entering a world where performance increases are derived from parallelism and efficiency is determined by locality.
Bull’s Novascale supercomputer currently uses 96 Nvidia GPGPUs, so a projected upgrade to 2015 chips and components would mean 2 peak petaflops from a 96 GPGPU machine. The GPGPU cost would be in the range of $100,000-200,000 dollars. Current Nvidia GPGPU cards were intruduced at the $1000-2000 price points.
The computer’s design has been updated with eight nodes of Nvidia Tesla S900 GPGPU cards featuring 96 GPUs in the GT-200 family. The company estimates that each Tesla card can deliver 1.1 Teraflops of computing power. Overall, the whole 96 GPUs can achieve about 54 percent of the performance delivered by 1068 8-core Nehalem processors
IBM looks set to create 2U systems with four of the dual-chip modules, giving the server 64 cores. These 2U systems will support up to 128GB of memory and hit 2 teraflops.
2009: Tick: Westmere – 32nm, 6-cores with HT, IMC and QPI
2010: Tock: Sandy Bridge – 32nm, 8-cores with HT, IMC, QPI, and the revolutionary new AVX game instruction technology
2011: Tick: Ivy Bridge – 22nm
2012: Tock: Haswell – 22nm, with a native 8-core design for on-die data and instruction caching along with integrated vector co-processor