Throughput processors have hundreds of cores today and will have thousands of cores by 2015. Performance scaling of single-thread processors stopped in 2002, Dally said, following a period when the industry derived a performance increase of 52 percent per year for more than 20 years. But throughput-optimized processors like graphics processing units (GPUs) are still improving by greater than 70 percent per year.
Interconnect is a dominant factor in power consumption, according to Dally. The emergence of optical interconnect technology may play a role, “but don’t hold your breath,” Dally said, citing technical issues.
Delivering a keynote address here at the Design Automation Conference Wednesday (July 29), William J. Dally, chief scientist and senior vice president of research at Nvidia and an engineering professor at Stanford University, said computing is entering a world where performance increases are derived from parallelism and efficiency is determined by locality.
Bull’s Novascale supercomputer currently uses 96 Nvidia GPGPUs, so a projected upgrade to 2015 chips and components would mean 2 peak petaflops from a 96 GPGPU machine. The GPGPU cost would be in the range of $100,000-200,000 dollars. Current Nvidia GPGPU cards were intruduced at the $1000-2000 price points.
The computer’s design has been updated with eight nodes of Nvidia Tesla S900 GPGPU cards featuring 96 GPUs in the GT-200 family. The company estimates that each Tesla card can deliver 1.1 Teraflops of computing power. Overall, the whole 96 GPUs can achieve about 54 percent of the performance delivered by 1068 8-core Nehalem processors
In 2010, IBM will ship Power7 at 4.0GHz in 2010 on a 45nm process.
IBM looks set to create 2U systems with four of the dual-chip modules, giving the server 64 cores. These 2U systems will support up to 128GB of memory and hit 2 teraflops.
A 12 page view of Intel to 2015 (from 2005)
A more recent 16 page technical paper from Intel on their Larrabee chips (32 to 64 cores)
Intel’s chip roadmap until 2012.
2009: Tick: Westmere – 32nm, 6-cores with HT, IMC and QPI
2010: Tock: Sandy Bridge – 32nm, 8-cores with HT, IMC, QPI, and the revolutionary new AVX game instruction technology
2011: Tick: Ivy Bridge – 22nm
2012: Tock: Haswell – 22nm, with a native 8-core design for on-die data and instruction caching along with integrated vector co-processor
Fujitsu Venus Chip will be 128 gigaflops, IBM Power7 will be 256 gigaflops in 2010

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.