IBM Research’s New Prototype AI Chip With 14 Times Energy Efficiency

IBM showed it’s possible to build analog AI chips that can handle natural-language AI tasks with an estimated 14 times more energy efficiency. Researchers from IBM labs around the world presented their prototype analog AI chip for energy-efficient speech recognition and transcription. Their design was utilized in two AI inference experiments and the analog chips performed these tasks just as reliably as comparable all-digital devices — but finished the tasks faster and used less energy.

The design that the team at IBM Research have created can encode 35 million phase-change memory devices per chip; in other words, models with up to 17 million parameters. While this isn’t yet at a size comparable to today’s cutting-edge generative AI models, combining several of these chips together has allowed it to tackle experiments on real AI use cases as effectively as digital chips could.

IBM has optimized the In computing, and especially in digital signal processing, a MAC operation is when the the product of two numbers is computed and added to an accumulator, part of the CPU that deals with arithmetic in process. MACs are a fundamental computing unit.multiply-accumulate (MAC) operations that dominate deep-learning compute. By reading the rows of an array of resistive non-volatile memory (NVM) devices, and then collecting currents along the columns, the team showed they can perform MACs within the memory. This eliminates the need to move the weights between memory and compute regions of a chip, or across chips. The analog chips can also carry out many MAC operations in parallel, which saves time and energy.

Nature – An analog-AI chip for energy-efficient speech recognition and transcription

Abstract
Models of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks but they exacerbate the poor energy efficiency of conventional general-purpose processors, such as graphics processing units or central processing units. Analog in-memory computing (analog-AI) can provide better energy efficiency by performing matrix–vector multiplications in parallel on ‘memory tiles’. However, analog-AI has yet to demonstrate software-equivalent (SWeq) accuracy on models that require many such tiles and efficient communication of neural-network activations between the tiles. Here we present an analog-AI chip that combines 35 million phase-change memory devices across 34 tiles, massively parallel inter-tile communication and analog, low-power peripheral circuitry that can achieve up to 12.4 tera-operations per second per watt (TOPS/W) chip-sustained performance. We demonstrate fully end-to-end SWeq accuracy for a small keyword-spotting network and near-SWeq accuracy on the much larger MLPerf8 recurrent neural-network transducer (RNNT), with more than 45 million weights mapped onto more than 140 million phase-change memory devices across five chips.