The next semiconductor node only provides 25% boost so Efficient Chip Stacking is a path forward

The difference between 28 and 20 nm is about 20 to 25 percent accord to Nvidia’s Chief Scientist Dally. Therefore, process doesn’t matter that much anymore. If Nvidia is clever about architecture and circuit design, then can make up for the fact that Nvidia has competitors [Intel] that are a node ahead.

Chip stacking is increasingly seen as an alternative to moving to the next semiconductor node at a time when process technology is providing less bang for the buck.

Nvidia says it will roll in 2015 Volta, a graphics chip using stacked memory.

One of the clever architectures engineers in Nvidia’s labs are working on is a ground reference signaling scheme geared for future system-in-package devices. The approach, still in research, promises links running at less than half a picojoule per bit at 20 Gbits/second, said Dally.

The I/O could enable organic substrates that are less expensive than silicon interposers but need physically larger links. Nvidia wants individual links that run at 10 Gbits/second per pin, about ten times the rate of today’s links, to enable components with 200 Gbytes/s bandwidth, Dally said.

IBM has used relatively large organic substrates for processor modules measuring as much as 100 mm on a side, Dally said. He sees the substrates used in 2.5-D stacks where a graphics die is laid next to a DRAM stack. Graphics chips generate too much heat to be stacked vertically with memories, and such stacks face relatively high costs and low yields, he added.

Nvidia chief executive Jen-Hsun Huang announced the company will ship in 2015 a next-generation graphics processor called Volta that uses stacked memories.

A Georgia Tech researcher working on 3-D stacks with through silicon vias was more skeptical.

“It seems organic interposers will win in terms of cost, yield, and reliability, and silicon interposers will win on interconnect size/pitch, performance, and power,” said Lim Sung Kyu. “If the target application calls for high memory bandwidth, I am not even sure if organic interposers can even meet the requirements,” he said.

To get to tomorrow’s exascale systems, chips need to slim down from about 100 picojoules per flop today to about 20, and they need to migrate from programming millions to billions of nodes, he said. Nvidia’s graphics are now used in about 50 of the world’s top supers, thanks in part to the maturity of Cuda.

In terms of investments, “China has a road map to get to exascale before anyone else and it’s putting piles of money on it,” Dally said. “Europe’s exascale program has not been scaled back” despite its economic woes, but funding for the U.S. initiative “is getting pushed back,” he said

If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks