If the SGI petaFLOP supercomputer in one cabinet does not turn out to be vaporware then having a 100 cabinet supercomputer system is relatively standard. 100 petaFLOPS system could exist by 2012. Another chip revision could then bring the ten fold performance boost to get to an exaflop supercomputer by 2015.
The SGI Hybrid platform offers GPU processing capabilities from NVIDIA® and ATI, as well as accelerator-based technology from Tilera®, and other peripheral component interconnect express (PCIe) based solutions. The technology will be seen in SGI products by the end of 2010.
Tilera announced the first 100 core processor in 2009. The TILE-Gx family, fabricated in TSMC’s 40 nanometer process, operates at up to 1.5 GHz with power consumption ranging from 10 to 55 watts.
Some of the technology highlights of the Tilera chips include:
Next-generation 64-bit core: New three-issue 64-bit core with full virtual memory system. Each core includes 32KB L1 I-cache, 32KB L1 D-cache and 256KB L2 cache, with up to 26MB total L3 coherent cache across the device.
* Enhanced SIMD instruction extensions: Improved signal processing performance with a 4 MAC/cycle multiplier unit delivering up to 600 billion MACs per second, more than 12x the fastest commercial DSP.
* Integrated high-performance DDR3 memory controllers: Two or four 72-bit controllers running up to 2133 MHz speeds with ECC support. Up to 1TB total capacity and powerful memory striping modes for maximum utilization.
*Hardware acceleration engines: On-chip MiCA™ (Multistream iMesh Crypto Accelerator) system delivers up to 40Gbps encryption and 20Gbps full duplex compression processing, tightly coupled to the iMesh for extremely low latency and wire-speed small packet throughput. In addition, a high-performance true random number generator (RNG) and public key accelerator enable up to 50,000 RSA handshakes per second.
* Packet processing accelerator: mPIPE™ (multicore Programmable Intelligent Packet Engine) system provides wire-speed packet classification, load balancing and buffer management. This flexible, C-programmable engine delivers 80 Gbps and 120 million packets-per-second of throughput for packets with multiple layers of encapsulation.
Tensilica and Berkeley National Labs
Berkeley Labs and Tensilica have been exploring the use of Tensilica’s Xtensa processor cores as the basic building blocks in a massively parallel system design. Tensilica’s Xtensa processor is about 400 times more efficient in floating point operations per watt than the conventional server processor chip
Quantum dot Cellular automata (QCA) could be an approach to zettaflop systems. A 2006 study looked at various technologies including reversible computing and considered 2015-2025 plans and timelines. Reversible computing needs to be solved and implemented to get to reasonable power levels.
Memristors, Graphene and Other Technology to Accelerate to ExaFLOPS and ZettaFLOPS
HP is predicting stacked low power memristor chips to be commercialized by 2015. This will speed up memory and put memory and processing together.
Jim Von Ehr and Zyvex are planning to deliver digital matter from building blocks by 2015 and rudimentary molecular manufacturing by 2020. Zyvex is able to devote about $10-20 million per year on advancing this research from various DARPA and other grants and the companies own resources.