BlueGene/Q is estimated to hit a peak performance of 20 Pflop/s, when it will go into operation as “Sequoia” supercomputer at the Lawrence Livermore National Laboratory in 2012. However, the architecture described in a patent would increase the performance to 107 PetaFLOPS.
Patent Application 20110219208 - Multi-petascale Highly Efficient Parallel Supercomputer.
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
A massive patent filing (#20110219208) from January of this year with more 649 pages and 2263 individual claims and descriptions explains that the basic architecture of the system consists of 1024 compute node ASICS that are built into 512 racks (a total of 524,288 nodes and 8,388,608 cores.) Each compute node holds BlueGene/Q’s 4-way hardware-threaded quad-core PowerPC A2 CPU architecture that effectively creates a processing system with 16 cores for each node. IBM said that each unit, in fact, has 18 cores as 1 core is used to improve chip yield and 1 core is used for system control and 16 are available to actual computation. Each node includes 32 MB of memory, which is sliced in 16 equal parts to be accessed by each core. The total memory bandwidth per node is 563 GB/s. In comparison, the Sequoia system will have 1,572,864 cores, 98,304 compute nodes and 96 racks.
If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks