IBM will invest $2 billion to develop artificial intelligence hardware and boost AI performance by 1000 times over the next ten years. IBM is partnering with State University of New York to develop an AI Hardware Center at SUNY Polytechnic Institute in Albany. New York will also provide a subsidy of $300 million.

The IBM Research AI Hardware Center will enable IBM and their partner ecosystem to achieve 1,000x AI performance efficiency improvement over the next decade. They will overcome current machine-learning limitations by using approximate computing with Digital AI Cores and in-memory computing with Analog AI Cores.

Approximate Computing with Digital AI Cores

The best hardware platforms for training deep neural networks (DNNs) has just moved from traditional single precision (32-bit) computations towards 16-bit precision. This is more energy efficient and uses less memory. IBM researchers have successfully trained DNNs using 8-bit floating point numbers (FP8) while fully maintaining the accuracy of deep learning models and datasets.

Approximate computing trades numerical precision for computational throughput enhancements but requires the development of algorithmic improvements to preserve model accuracy. This approach also complements other approximate computing techniques and can lead to 40-200x speedup over existing compression methods.

IBM Research AI Hardware Center is developing a roadmap for 1,000x improvement in AI compute performance efficiency over the next decade, with a pipeline of Digital AI Cores and Analog AI Cores.

Our analog AI cores are part of an in-memory computing approach in performance efficiency which improves by suppressing the so-called Von Neuman bottleneck by eliminating data transfer to and from memory. Deep neural networks are mapped to analog cross point arrays and new non-volatile material characteristics are toggled to store network parameters in the cross points.

In-memory Computing with Analog AI Cores

Initial IBM estimates for the potential of such NVM-based (analog non-volatile memories) chips for training fully-connected layers exceed the specifications of today’s GPUs by two orders of magnitude. They have calculated the expected computational energy efficiency (28,065 GOP/sec/W) and throughput-per-area (3.6 TOP/sec/mm2).

IBM has combined long-term storage in phase-change memory (PCM) devices, near-linear update of conventional Complementary Metal-Oxide Semiconductor (CMOS) capacitors and novel techniques for cancelling out device-to-device variability.

Nature – Equivalent-accuracy accelerated neural-network training using analogue memory

Google Tensor Chips and Pods

Google’s third generation Tensor (TPU) chips produce about 90 trillion operations per second and use about 200 watts. This is about 450 GOP/sec/W.

Written Christina Wong of Nextbigfuture.com

pushtook