SGI announced in June its Altix UV system that supports up to 256 Intel Nehalem-EX processors and up to 16 terabytes of main memory all housed in four cabinets. It uses 32 four-port node controller ASICs designed by SGI.
By December 2010, SGI will ship versions of the system using a 16-port router ASIC to allow users to connect 128 of the four-cabinet nodes into a loosely coupled system supporting eight petabytes of total aggregate memory. The design is the first implementation of what SGI calls a global memory architecture that could scale to support an exascale-class system by 2018.
The Project Mojo systems will come in two racks and with two different stick capacities. The high-end box will use a modified version of the 24-inch blade racks employed by the Altix UV 1000 supers, which are based on Intel’s Xeon 7500 processors and SGI’s NUMAlink 5 shared memory interconnect, while another will be based on a new 19-inch rack, code-named “Destination,” that aims to replace the 20 different racks that SGI inherited from the merger of SGI and Rackable Systems. The modified 24-inch Altix UV rack will hold 80 sticks, each with two CPUs and two double-wide GPU co-processors. The 19-inch Destination rack will be able to hold 63 sticks.
Assuming SGI can employ the AMD FireStream GPUs announced in late June, and based on the “Cypress” GPUs, in the Project Mojo boxes, then the larger 24-inch rack machine using the double-wide FireStream 9370 should hit 422 teraflops of aggregate GPU performance and the smaller 19-inch rack should come in at 332.6 teraflops
Using Nvidia’s double-wide, fanless Tesla M2070 GPUs there would be 164.8 teraflops for the 24-inch rack and 129.8 teraflops for the 19-inch rack. 82.4 double precision teraflops with the Tesla M2070s and 84.5 teraflops with the FireStream 9370s.
Eight Tilera 100-core chips on a Project Mojo stick and the same 80 sticks in a 24-inch rack, that works out to 480 trillion integer operations per second. You need a little more than twice this density to do integer math on the analog of a petaflops in floating point performance, which is a quadrillion (10^15) integer calculations per second. Luckily, Tilera is working on a 200-core chip, due around 2013, which should help SGI hit that goal
In 2015, the Nvidia chip that follows the Maxwell chip should have two to four times the performance. Exaflop systems seem like they should be in reach in the 2013-2015 timeframe.
Faster optical networking from IBM and others will also help speed high end systems.
If you liked this article, please give it a quick review on Reddit, or StumbleUpon. Thanks