Nvidia's so-called Echelon system is just a paper design backed up by simulations, so it could change radically before it gets built. Elements of its chip designs ultimately are expected to show up across the company's portfolio of handheld to supercomputer graphics products.
Dally described a graphics core that can process a floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips. Eight of the cores would be packaged on a single streaming multiprocessor (SM) and 128 of the SMs would be packed into one chip.
The result would be a thousand-core graphics chip with each core capable of handling four double precision floating-point operations per clock cycle—the equivalent of 10 teraflops on a chip. A chip with just eight of the cores would someday power a handset
The Echelon chip packs just twice as many cores as today's high-end Nvidia GPUs. However, today's cores handle just one double precision floating-point operation per cycle, compared to four for the Echelon chip.
Many of the advances in the chip come from its use of memory. The Echelon chip will use 256 Mbytes of SRAM memory that can be dynamically configured to meet the needs of an application.
The SRAM could be broken up into as many as six levels of cache, each of a variable size. At the lowest level each core would have its own private cache. The goal is to get data as close to processing elements as possible to reduce the need to move data around the chip, wasting energy.
To ease programming, the design is cache coherent across both graphics and traditional processor cores.
If you liked this article, please give it a quick review on ycombinator, or Reddit, or StumbleUpon. Thanks
Ocean Floor Gold and Copper
Ocean Floor Mining Company