Floating Point 16 bit will be at 2-3 exaflop supercomputers in 2018

Double precision exaflop/second has been the traditional definition of general purpose exaflop supercomputer. There are domain-specific machines and even the American DoE Summit and Sierra supercomputers where it can be different. These two machines, because of the NVIDIA Volta, will have significant acceleration speed in reduced precision arithmetic FP16, with what they call their Tensor Cores, which are in reality 4-by-4 FP16 single cycle matrix engines. The peak performance of Volta chips is 120 Tflop/s. So, the performance of the Summit and Sierra that will deploy these chips in tens of thousands, in double precision arithmetic, may be somewhere around 130-200 Petaflop/s, but in terms of their FP16 AI flop/s they will be 2-3 exaflop/s. The world had been fixated on double precision arithmetic being general purpose, that in reality, people are building machines that are a little bit more domain-specific, and we already will reach exascale by next year in that sense.

China’s second 100 Petaflop/s that is the successor to the Tianhe-2, the Tianhe-2A is going to be deployed sometime 2017 or early next year, but nonetheless, using indigenous Chinese technology, since they have been prohibited from using Intel Xeon Phi (Knights Landing) which was their original plan. Everybody agrees the US Intel chip ban actually drove them quicker to their goal, plus there are several other companies and centers in the running to reach exascale by 2020 at the earliest, or maybe 2021. In addition to Sunway TaihuLight, and the Tianhe-2A, there is a third project still in the running; three of their prototypes are to be presented, demonstrated and then going towards exascale.

The US Exascale Computing Project (ECP) has been given the guidance by, and is under direction of the Department of Energy. Both sides in there are involved: the Office of Science and the National Nuclear Security Administration (NNSA). It is a very risk adverse, very responsible project that has been defined under somewhat lower budgets than had been anticipated but at least with the assumption that it would achieve its end goal shortly after 2020.

Japan is largely on track with its Post-K. However, since the last ISC it was announced that this machine will be delayed one or two years due to the fact that semi-conductor scaling is slowing down, and as a result the anticipated performance could not be reached with the original plan. Fujitsu and Riken had to reorganize with a new plan that has the goal of the 2021 – 2022 timeframe deployment. They are adding new features, such as, it was announced, it will use an ARM processor in August with vector processor instruction set SVE extensions.

In Europe, they announced the intention to build exascale machines with European technology, the European Commission being very proactive and promoting this new direction. The details are not disclosed yet, so we will see next year what will happen with these European efforts. There are lots of research projects in Europe, but none of them are really, I would say, concrete enough by themselves to be able to build these large-scale machines in production, but I think finally Europe is stepping up to this game. However, compared to other countries, it does not have the industrial backing up to this exten