Beyond Exaflop supercomputers will require new materials, new architectures, new memory and quantum computers

Eurolab HPC tries to assess the future disruptive technology for high performance computing beyond Exascale computers

They survey the currents state of research and development and its potential for the future of the following hardware technologies:
CMOS scaling
Die stacking and 3D chip technologies
Non-volatile Memory (NVM) technologies
Resistive Computing
Neuromorphic Computing
Quantum Computing
Graphene and
Diamond Transistors

They categorize the technologies as:
o Sustaining technologies: CMOS scaling and Die stacking
o Disruptive technologies that potentially create a new line of HPC hardware: NVM and Photonics
o Disruptive technologies that potentially create alternative ways of computing: Resistive, Neuromorphic, and Quantum Computing
o Disruptive technologies that potentially replace CMOS for processor logic: Nanotube, Graphene, and Diamond technologies

Current CMOS technology may presumably scale continuously in the next decade, down to 6 or 5 nm. However, scaling CMOS technology leads to steadily increasing costs per transistor, power consumption, and to less reliability. Die stacking could result in 3D many-core microprocessors with reduced intra core wire length, enabling high transfer bandwidths, lower latencies and reduced communication power consumption.

3D stacking will also be used to scale flash memories, because 2D NAND flash technology does not further scale. In the long run even 3D flash memories will probably be replaced by memristor or other non-volatile memory (NVM) technologies.

These, depending on the actual type, allow higher structural density, less leakage power, faster read- and write access, more endurance and can nevertheless be more cost efficient.

However, the whole memory hierarchy may change in the upcoming decade. DRAM scaling will only continue with new technologies, in fact NVMs, which will deliver nonvolatile memory potentially replacing or being used in addition to DRAM. Some new non-volatile memory technologies could even be integrated on-chip with the microprocessor cores and offer orders of magnitude faster read/write accesses and also much higher endurances than flash. Intel demonstrated the possible fast memory accesses of the 3D-XPoint NVM Technology used in their Optane Technology. HP’s computer architecture proposal called “The Machine” targets a machine based on new NVM memory and photonic busses. The Machine sees the memory instead of processors in the centre. This so called Memory-Driven Computing unifies the memory and storage into one vast pool of memory. HP proposes advanced photonic fabric to
connect the memory and processors. Using light instead of electricity is the key to rapidly accessing any part of the massive memory pool while using much less energy.

The Machine is a first example of the new Storage-class Memory (SCM), i.e., a nonvolatile memory technology in between memory and storage, which may enable new data access modes and protocols that are neither ‘memory’ nor ‘storage’. It would particularly increase efficiency of fault tolerance check pointing, which is potentially needed for shrinking CMOS processor logic that leads to less reliable chips. There is a major impact from this technology on software and computing. SCM provides orders of magnitude increase in capacity with near-DRAM latency which would push software towards in-memory computing.

Resistive Computing, Neuromorphic Computing and Quantum Computing are promising technologies that may be suitable for new hardware accelerators but less for new processor logic. Resistive computing promises a reduction in power consumption and massive parallelism. It could enforce datacentric and reconfigurable computing, leading away from the Von-Neumann architecture. Humans can easily outperform currently available high-performance computers in tasks like vision, auditory perception and sensory motor-control. As Neuromorphic Computing would be efficient in energy and space for artificial neural network applications, it would be a good match for these tasks. More lack of abilities of current computers can be found in the area of unsolved problems in computer science. Quantum Computing might solve some of these problems, with important implications for public-key cryptography, searching, and a number of specialized computing applications.

  • Full 3D stacking may pose further requirements to system software and programming environments: The higher throughput and lower memory latency when stacking memory on top of processing may require changes in programming environments and application algorithms.
  • Stacking specialized (e.g. analog) hardware on top of processing and memory elements lead to new (embedded) high-performance applications.
  • Stacking hardware accelerators together with processing and memory elements require programming environment and algorithmic changes.
  • 3D multicores require software optimizations able to efficiently utilize the characteristics of 3rd dimension, .i.e. e.g., different latencies and throughput for vertical versus horizontal interconnects.
  • 3D stacking may to new form factors that allow for new (embedded) high performance applications.

Photonics will be used to speed up all kind of interconnects – layer to layer, chip to chip, board to board, and compartment to compartment with impacts on system software, programming environments and applications such that:

  • A flatter memory hierarchy will be reached (combined with 3D stacking and NVM) requiring software changes for efficiency redefining what is local in future.
  • It is mentioned that energy-efficient Fourier-based computation is possible as proposed in the Optalysys project.
  • The intrinsic end-to-end nature of an efficient optical channel will favor broadcast/multicast based communication and algorithms.
  • A full photonic chip will totally change software in a currently rarely investigated manner.

A number of new technologies will lead to new accelerators. We envision programming environments that allow defining accelerator parts of an algorithm independent of the accelerator itself. OpenCL is such a language distinguishing “general purpose” computing parts and accelerator parts of an algorithm, where the accelerator part can be compiled to GPUs, FPGAs, or many-cores like the Xeon Phi. Such programming environment techniques and compilers have to be enhanced to improve performance portability and to deal with potentially new accelerators as, e.g., neuromorphic chips, quantum computers, in-memory resistive computing devices etc. System software has to deal with these new possibilities and map computing parts to the right accelerator.

Neuromorphic Computing is particularly attractive for applying artificial neural network and deep learning algorithms in those domains where, at present, humans outperform any currently available high-performance computer, e.g., in areas like vision, auditory perception, or sensory motor-control. Neural information processing is expected to have a wide applicability in areas that require a high degree of flexibility and the ability to operate in uncertain environments where information usually is partial, fuzzy, or even contradictory. The success of the IBM Watson computer is an example for such new application possibilities.

Quantum Computing potentially solves problems impossible by classical computing, but posts challenges to compiler and runtime support. Moreover, quantum error correction is needed due to high error rates (10^-3 ). Applications of quantum computers could be new encryptions, quantum search, quantum random walk, etc.

Resistive Computing may lead to massive parallel computing based on data-centric and reconfigurable computing paradigms. In memory computing algorithms may be executed on specialised resistive computing accelerators.

Quantum Computing, Resistive Computing as well as Graphene and Nanotube-based computing are still highly speculative hardware technologies

Resistive memories, i.e. memristors, are an emerging class of non-volatile memory technology. The memristors electrical resistance is not constant but depends on the history of current that had previously flowed through the device. The device remembers its history—the so-called non-volatility property: when the electric power supply is turned off, the memristor remembers its most recent resistance until it is turned on again.

Among the most prominent memristor candidates and close to commercialization arephase change memory (PCM), metal oxide resistive random accessmemory (RRAM or ReRAM), and conductive bridge random access memory(CBRAM).

PCM can be integrated in the CMOS process and the read/write latency is only by tens of nanoseconds slower than DRAM whose latency is roughly around 100ns. The write endurance is hundreds of millions of writes per cell at current processes. This is why PCM is currently positioned only as a Flash replacement. RRAM offers a simple cell structure which enables reduced processing costs. The endurance can be more than 50 million cycles and the switching energy is very low. RRAM can deliver 100x lower read latency and 20x faster write performance compared to NAND Flash. CBRAM can also write with relatively low and with high speed. The read/write latencies are close to DRAM.

It is foreseeable, that other NVM technologies will supersede current flash memory. PCM for instance might be 1000 times faster and 1000 times more resilient. Some NVM technologies have been considered as a feasible replacement for SRAM. Studies suggest that replacing SRAM with STT-RAM could save 60% of LLC energy with less than 2% performance degradation.

It is unclear when most of the new technologies may be mature enough and which of them will prevail. But this is not of importance, because all have the same goal, namely to revolutionize the current storage technology

A well-known but highly debated example of a quantum computer is the D-Wave machine built by the Canadian company with the same name. It is not yet proven that D-Wave actually uses the above mentioned quantum phenomena nor has any exponential speedup been shown except in one isolated case but which was not considered conclusive by the independent researchers such as M. Schroyer from ETH Zurich. In addition, D-Wave is based on quantum annealing and thus only usable for specific optimization problems.

An alternative direction is to build a universal quantum computer based on quantum gates, such as Hadamard, rotation gates and CNOT. Google, IBM and Intel have all initiated research projects in this domain and currently superconducting qubits seem to be the most promising direction.

Currently, the European Commission is preparing the ground for the launch in 2018 of a €1 billion flagship initiative on quantum technologies

Quantum algorithms (other than quantum key and Shore) are:
o Grover’s Algorithm is the second most famous result in quantum computing. Often referred to as “quantum search,” Grover’s algorithm actually inverts an arbitrary function by searching n input combinations for an output value in √n time.
o Binary Welded Tree is the graph formed by joining two perfect binary trees at the leaves. Given an entry node and an exit node, The Binary Welded Tree Algorithm uses a quantum random walk to find a path between the two. The quantum random walk finds the exit node exponentially faster than a classical random walk.
o Boolean Formula Algorithm can determine a winner in a two player game by performing a quantum random walk on a NAND tree.
o Ground State Estimation algorithm determines the ground state energy of a molecule given a ground state wave function. This is accomplished using quantum phase estimation.
o Linear Systems algorithm makes use of the quantum Fourier Transform to solve systems of linear equations.
o Shortest Vector problem is an NP-Hard problem that lies that the heart of some lattice-based cryptosystems. The Shortest Vector Algorithm makes use of the quantum Fourier Transform to solve this problem.
o Class Number Computes the class number of a real quadratic number field in polynomial time. This problem is related to elliptic-curve cryptography, which is an important alternative to the product-of-two-primes approach currently used in public-key cryptography.
o It is expected that machine learning will be transformed into quantum learning – the prodigious power of qubits will narrow the gap between machine learning and biological learning.

In general, the focus is now on developing algorithms requiring a low number of qubits (a few hundred) as that seems to be the most likely reachable goal in the 10-15 year time frame

Currently the gate length of the fabricated diamond transistors is in the single-digit micrometer range. Compared with the current 22nm technology with gate lengths of about 25nm, a reduction in size is absolutely necessary in order to allow fast working circuits (limitation of the propagation delays).

Producing reasonable diamond wafers for mass production could be possible. The time for producing diamond wafers is another factor that has to be
reduced drastically to compete with other technologies.

SOURce- Eurolab