Accelerating neural networks with hardware isn’t new to IBM. It recently announced the sale of some of its TrueNorth chips to Lawrence National Labs for AI research. TrueNorth’s design is neuromorphic, meaning that the chips roughly approximate the brain’s architecture of neurons and synapses. Despite its slow clock rate of 1 KHz, TrueNorth can run neural networks very efficiently because of its million tiny processing units that each emulate a neuron.
IBM researchers Tayfun Gokmen and Yuri Vlasov propose a new chip architecture, using resistive computing to create tiles of millions of Resistive Processing Units (RPUs), which can be used for both training and running neural networks.
A “human-scale” simulation with 100 trillion synapses (with relatively simple models of neurons and synapses) required 96 Blue Gene/Q racks of the Lawrence Livermore National Lab Sequoia supercomputer—and, yet, the simulation ran 1,500 times slower than real-time. A hypothetical computer to run this simulation in real-time would require 12GW, whereas the human brain consumes merely 20W.
The RPU design proposed is expected to accommodate a variety of deep neural network (DNN) architectures, including fully-connected and convolutional, which makes them potentially useful across nearly the entire spectrum of neural network applications. Using existing CMOS technology, and assuming RPUs in 4,096-by-4,096-element tiles with an 80-nanosecond cycle time, one of these tiles would be able to execute about 51 GigaOps per second, using a minuscule amount of power. A chip with 100 tiles and a single complementary CPU core could handle a network with up to 16 billion weights while consuming only 22 watts (only two of which are actually from the RPUs — the rest is from the CPU core needed to help get data in and out of the chip and provide overall control)
Using chips densely packed with these RPU tiles, the researchers claim that, once built, a resistive-computing-based AI system can achieve performance improvements of up to 30,000 times compared with current architectures, all with a power efficiency of 84,000 GigaOps per-second per-watt.
SOURCES - IBM, Extremetech