HPCWire – Thomas Sterling, Professor of Informatics & Computing at Indiana University, takes us through some of the most critical developments in high performance computing, explaining why the transition to exascale is going to be very different than the ones in the past and how the United States is losing its leadership in HPC innovation.
Exascale is also different because unlike previous milestones, it is unlikely that we will face yet another one in the future. These words may be thrown back in my face, but I think we will never reach zettaflops, at least not by doing discrete floating point operations. We are reaching the anvil of the technology S-curve and will be approaching an asymptote of single program performance due to a combination of factors including atomic granularity at nanoscale.
A new execution model as an embodiment of a paradigm shift will drive this transition from old systems to new. We have done this before in the case of scalar to vector and SIMD, and again from these to message passing, MPPs, and clusters. We are now simply — or not so simply — facing another phase shift in HPC system programming, structure, and operation.
Of course I anticipate something else will be devised that is beyond my imagination, perhaps something akin to quantum computing, metaphoric computing, or biological computing. But whatever it is, it won’t be what we’ve been doing for the last seven decades. That is another unique aspect of the exascale milestone and activity. For a number, I’m guessing about 64 exaflops to be the limit, depending on the amount of pain we are prepared to tolerate.
Thomas Sterling will be delivering the Wednesday keynote at this year’s International Supercomputing Conference (ISC’12), which will take place in Hamburg, Germany from June 17-21. His presentation will examine the achievements over the past 12 months in high performance computing.
What’s the biggest hardware challenge to attain exascale computing?
Sterling: The usual response to this question is either “power” or “resilience” and these are certainly critical challenges to attaining exascale. Depending on the analysis of choice, without innovative ways of managing vertical and lateral data movement power estimates based on anticipated technology trends suggest an order of magnitude greater power demand than is considered practical. Single point failure modes of systems comprising hundreds of millions of cores will exhibit mean-time-to-interrupt on the order of minutes or many seconds, much less than the expected time to service a checkpoint or restart cycle using conventional methods.
While both are clearly important, I think the biggest hardware challenge is architecture. This may surprise many of our colleagues because there is a general expectation that the system architecture is likely to be an evolutionary extension of the current mix of multicore sockets and GPU accelerators. This view is driven by the number one concern, which is parallelism and the need to expose and exploit it. Not only will the system architecture have to provide sufficient hardware concurrency of on the order of a billion or more simultaneous actions for the throughput requirement, it will have to use more of it as a latency mitigating method requiring additional architecture change.
Further, it will have to incorporate mechanisms to reduce overheads in order to make effective use of finer granularity tasks (e.g., lightweight user threads) such as the instantiation of remote actions. Support for advanced forms of global address spaces, their management, and address translation will be required in support of randomly distributed global data (e.g., dynamic graphs). New mechanisms for efficient semantically rich synchronization and continuation (control object) migration to manage locality of control will be part of future designs if they are to succeed at unprecedented scale.
Additional hardware mechanisms will be required for fault tolerance including error detection, isolation, in-memory checkpointing, and recovery through reconfiguration. Power reduction will demand active sensor and control hardware mechanisms to continuously adjust energy usage based on application demands. New processor cores and their relationship to memory (for example, processor in memory) for superior bandwidth, reduced latency, and lower power will further drive hardware innovation needs.