Today IBM is applying massively distributed deep learning algorithms to Graphics processing units (GPU) for high-speed data movement, to ultimately understand images and sound. The DDL algorithms “train” on visual and audio data, and the more GPUs should mean faster learning. To date, IBM’s record-setting 95 percent scaling efficiency (meaning improved training as more GPUs are added) can recognize 33.8 percent of 7.5 million images, using 256 GPUs on 64 “Minsky” Power systems.
Distributed deep learning has progressed at a rate of about 2.5 times per year since 2009, when GPUs went from video game graphics accelerators to deep learning model trainers.
What technology do we need to develop in order to continue this rate of progress and go beyond the GPU?
IBM Research believe that this transition from GPUs will happen in three stages.
1. utilize GPUs and build new accelerators with conventional CMOS in the near term to continue.
In 2015, Suyog Gupta, et al. demonstrated in their ICML paper Deep learning with limited numerical precision that in fact reduced-precision models have equivalent accuracy to today’s standard 64 bit, but using as few as 14 bits of floating point precision. IBM sees this reduced precision, faster computation trend contributing to the 2.5X-per-year improvement at least through the year 2022.
2. look for ways to exploit low precision and analog devices to further lower power and improve performance
phase change memory, a next-gen memory material, may be the first analog device optimized for deep learning networks. How does a memory – the very bottleneck of von-Neumann architecture – improve machine learning? Because we’ve figured out how to bring computation to the memory. Recently, IBM scientists demonstrated in-memory computing with 1 million devices for applications in AI, publishing their results, Temporal correlation detection using computational phase-change memory, in Nature Communications, and also presenting it at the IEDM session Compressed Sensing Recovery using Computational Memory.
Analog computing’s maturity will extend the 2.5X-per-year machine learning improvement for a few more years, to 2026 or thereabouts…
3. enter the quantum computing era, it will potentially offer entirely new approaches.
IBM is at 50 qubits now and could have a 100 qubits later this year.
50-100 qubits will enable superior quantum chemistry for better material analysis.
IBMs Jeff Welser said the first real application is likely to be materials analysis using quantum chemistry with 50-100 qubit systems.