Over the last twenty years, the open source community has provided more and more software on which the world’s High Performance Computing (HPC) systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. But although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual PetaScale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and GPUs. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.
Unpacking the elements of the goal statement in the context of the work performed so far by the IESP reveals some of the characteristics that the X-stack must possess, at minimum:
* The X-stack must enable suitably designed science applications to exploit the full resources of the largest systems: The main goal of the X-stack is to support groundbreaking research on tomorrow’s exascale computing platforms. By using these massive platforms and X-stack infrastructure, scientists should be empowered to attack problems that are much larger and more complex, make observations and predictions at much higher resolution, explore vastly larger data sets, and reach solutions dramatically faster. To achieve this goal, the X-stack must enable scientists to use the full power of exascale systems.
* The X-stack must scale both up and down the platform development chain: Science today is done on systems at a range of different scales, from departmental clusters to the world’s largest supercomputers. Since leading research applications are developed and used at all levels of this platform development chain, the X-stack must support them well at all these levels.
* The X-stack must be highly modular, so as to enable alternative component contributions: The X-stack is intended to provide a common software infrastructure on which the entire community builds its science applications. For both practical and political reasons (e.g., sustainability, risk mitigation), the design of the X-stack should strive for modularity that makes it possible for many groups to contribute and accommodate more than one choice in each software area.
* The X-stack must offer open source alternatives for all components in the X-stack: For both technical and mission oriented reasons, the scientific software research community has long played a significant role in the open source software movement. Continuing this important tradition, the X-stack will offer open source alternatives for all of its components, even though it is clear that exascale platforms from particular vendors may support, or even require, some proprietary software components as well.
Among the critical aspects of future systems, available by the end of the next decade, which we can predict with some confidence are the following:
* Feature size of 22 to 11 nanometers, CMOS in 2018
* Total average of 25 picojoules per floating point operation
* Approximately 10 billion-way concurrency for simultaneous operation and latency hiding
* 100 million to 1 billion cores
* Clock rates of 1 to 2 GHz
* Multithreaded, fine-grained concurrency of 10- to 100-way concurrency per core
* Hundreds of cores per die (varies dramatically depending on core type and other factors)
* Global address space without cache coherence; extensions to PGAS (e.g., AGAS)
* 128-petabyte capacity mix of DRAM and nonvolatile memory (most expensive subsystem)
* Explicitly managed high-speed buffer caches; part of deep memory hierarchy
* Optical communications for distances > 10 centimeters, possibly intersocket
* Optical bandwidth of 1 terabit per second
* Systemwide latencies on the order of tens of thousands of cycles
* Active power management to eliminate wasted energy by momentarily unused cores
* Fault tolerance by means of graceful degradation and dynamically reconfigurable structures
* Hardware-supported rapid thread context switching
* Hardware-supported efficient message-to-thread conversion for message-driven computation
* Hardware-supported, lightweight synchronization mechanisms
* 3-D packaging of dies for stacks of 4 to 10 dies each including DRAM, cores, and networking
Because of the nature of the development of the underlying technology most of the predictions above have an error margin of +/-50% or a factor of 2 independent of specific roadblocks that may prevent reaching the predicted value.
netseer_tag_id = “2397”;
netseer_ad_width = “750”;
netseer_ad_height = “80”;
netseer_task = “ad”;