The petaflop supercomputers we have now are too large and consume too much power. Petascale machines have a footprint of about 1/10th of a football field and consume several megawatts (MW). One megawatt costs about one million dollars per year.
What is needed for exaflop scale computing ?
* a single commodity chip should deliver terascale performance—namely, 10^12 operations per second (tera-ops).
* Running one million energy and space efficient terascale chips would enable exaflop 10^18 operation systems
*To attain extreme-scale computing, researchers must address architectural challenges in energy and power efficiency, concurrency and locality, resiliency, and programmability.
* A possible target for extreme-scale computing is an exa-op data center that consumes 20 MW, a peta-op departmental server that consumes 20 kilowatts (KW), and a tera-op chip multiprocessor that consumes 20 watts (W). These numbers imply that the machine must deliver 50 giga operations (or 50 × 10^9 operations) per watt. Because these operations must be performed in a second, each operation can only consume, on average, an energy of 20 pico-Joules (pJ).
* Intel’s Core Duo mobile processor (2006) used 10,000 pJ per instruction.
* large machines spend most of the energy transferring data from or to remote caches, memories, and disks. Minimizing data transport energy, rather than arithmetic logic unit (ALU) energy, is the real challenge
Need several technologies
Near-threshold voltage operation.
One of the most effective approaches for energy-efficient operation is to reduce the supply voltage (Vdd) to a value only slightly higher than the transistor threshold voltage (Vth). This is called near-threshold voltage (NTV) operation. It corresponds to a Vdd value of around 0.4 V, compared to a Vdd of around 1 V for current designs..
Broadly speaking, operation under NTV can reduce the gates’ power consumption by about 100× while increasing their delay by 10×. The result is a total energy savings of one order of magnitude.3 In addition to the 10 increase in circuit delay, the close proximity of Vdd and Vth induces a 5× increase in gate delay variation due to process variation, and a several orders-of-magnitude increase in logic failures—especially in memory structures, which are less variation tolerant
Aggressive use of circuit or architectural techniques that minimize or tolerate process variation can address the higher-variation shortcoming. This includes techniques such as body biasing and variation-aware job scheduling. Finally, novel designs of memory cells and other logic can solve the problem of higher probability of logic failure. Overall, NTV operation is a promising direction that several research groups are pursuing.
Nonsilicon memory. Nonsilicon memory is another relevant technology. Phase change memory (PCM), which is currently receiving much attention, is one type of nonsilicon memory. PCM uses a storage element composed of two electrodes separated by a resistor and phase-change material such as Ge2Sb2Te5.4 A current through the resistor heats the phase-change material, which, depending on the temperature conditions, changes between a crystalline (low-resistivity) state and an amorphous (high-resistivity) one—hence recording one of the two values of a bit. PCM’s main attraction is its scalability with process technology. Indeed, both the heating contact areas and the required heating current shrink with each technology generation. Therefore, PCM will enable denser, larger, and very energy-efficient main memories. DRAM, on the other hand, is largely a nonscalable technology, which needs sizable capacitors to store charge and, therefore, requires sizable transistors.
Currently, PCM has longer access latencies than DRAM, higher energy per access (especially for writes), and limited lifetime in the number of writes. However, advances in circuits and memory architectures will hopefully deliver advances in all these axes while retaining PCM scalability. Finally, because PCM is nonvolatile, it can potentially support novel, inexpensive checkpointing schemes for extreme-scale architectures. Researchers can also use it to design interesting, hybrid main memory organizations by combining it with plain DRAM modules.
Other system technologies. Several other technologies will likely significantly impact energy and power efficiency. An obvious one is 3D die stacking, which will reduce memory access power. A 3D stack might contain a processor die and memory dies, or it might contain only memory dies. The resulting compact design eliminates energy-expensive data transfers, but introduces manufacturing challenges, such as the interconnection between stacked dies through vias. Interestingly, such designs, by enabling high-bandwidth connections between memories and processors, might also induce a reorganization of the processor’s memory hierarchy. Very high bandwidth caches near the cores are possible.
Efficient on-chip voltage conversion is another enabling system technology. The goal here is for the machine to be able to change the voltage of small groups of cores in tens of nanoseconds, so they can adapt their power to the threads running on them or to environmental conditions. A voltage controller in each group of cores can regulate the group’s voltage. Hopefully, the next few years will see advances in this area.
Photonic interconnects. Optics have several key properties that can be used for interconnects. They include low-loss communication, very large message bandwidths enabled by wavelength parallelism, and low transport
latencies, as given by the speed of light
Substantial advances in architecture and hardware technologies should appear in the next ew years. For extreme-scale computing to become a reality, we need to revamp most of the subsystems of current multiprocessors. Many aspects remain wide open, including effective NTV many-core design and operation; highly energy-efficient checkpointing; rearchitecting the memory and disk subsystems for low energy and fewer parts; incorporating high-impact technologies such as nonvolatile memory, optics, and 3D die stacking; and developing cost-effective cooling technologies.
The Challenges of to achieving exaflop computing are listed in this article which refers to a 297 page pdf on the issues.