58.8 Times Faster AI Training With New Chip Architecture

A brain-inspired computing architecture speeds up complex data processing by running its algorithms inside its memory, significantly saving time and energy.

Above – Software and hardware overview of DUAL based on Hyperdimensional Computing CREDIT IEEE/ACM International Symposium on Microarchitecture 2020

Scientists at DGIST in Korea, and UC Irvine and UC San Diego in the US, have developed a computer architecture that processes unsupervised machine learning algorithms faster, while consuming significantly less energy than state-of-the-art graphics processing units. The key is processing data where it is stored in computer memory and in an all-digital format. The researchers presented the new architecture, called DUAL.

Scientists have been looking into processing in-memory (PIM) approaches. But most PIM architectures are analog-based and require analog-to-digital and digital-to-analog converters, which take up a huge amount of the computer chip power and area. They also work better with supervised machine learning, which includes labeled datasets to help train the algorithm.

To overcome these issues, Kim and his colleagues developed DUAL, which stands for digital-based unsupervised learning acceleration. DUAL enables computations on digital data stored inside a computer memory. It works by mapping all the data points into high-dimensional space; imagine data points stored in many locations within the human brain.

The scientists found DUAL efficiently speeds up many different clustering algorithms, using a wide range of large-scale datasets, and significantly improves energy efficiency compared to a state-of-the-art graphics processing unit. The researchers believe this is the first digital-based PIM architecture that can accelerate unsupervised machine learning.

It is not clear if there are improvements with the DUAL architecture that would improve the CEREBRAS wafer scale chips. The CEREBRAS wafer-scale AI chips have 18 gigabytes of on chip memory. Those would be the ideal way to implement superior processing in memory.

Abstract:
Today’s applications generate a large amount of data that need to be processed by learning algorithms. In practice, the majority of the data are not associated with any labels. Unsupervised learning, i.e., clustering methods, are the most commonly used algorithms for data analysis. However, running clustering algorithms on traditional cores results in high energy consumption and slow processing speed due to a large amount of data movement between memory and processing units. In this paper, we propose DUAL, a Digital-based Unsupervised learning AcceLeration, which supports a wide range of popular algorithms on conventional crossbar memory. Instead of working with the original data, DUAL maps all data points into high-dimensional space, replacing complex clustering operations with memory-friendly operations. We accordingly design a PIM-based architecture that supports all essential operations in a highly parallel and scalable way. DUAL supports a wide range of essential operations and enables in-place computations, allowing data points to remain in memory. We have evaluated DUAL on several popular clustering algorithms for a wide range of large-scale datasets. Our evaluation shows that DUAL provides a comparable quality to existing clustering algorithms while using a binary representation and a simplified distance metric. DUAL also provides 58.8× speedup and 251.2× energy efficiency improvement as compared to the state-of-the-art solution running on GPU.