It has faster data access via high-bandwidth memory. Training deep learning networks involves moving a lot of data, and current memory technologies are simply not up to the task. The Nervana Engine uses a new memory technology called High Bandwidth Memory that is both high-capacity and high-speed, providing 32 GB of on-chip storage and a blazingly fast 8 Tera-bits per second of memory access speed.
The Nervana Engine design includes mostly multipliers and local memory and skips elements such as caches that are needed for graphics processing but not deep learning. As a result, the Nervana Engine achieves unprecedented compute density and an order of magnitude more raw computing power than today’s state-of-the-art GPUs.
Startup Nervana Systems has had $28 million in funding.
The Nervana Engine has separate pipelines for computation and data management, so new data is always available for computation. This pipeline isolation, combined with plenty of local memory, means that the Nervana Engine can run near its theoretical maximum throughput much of the time
The Nervana Engine includes six bi-directional high-bandwidth links, enabling ASICs to be interconnected so that data can move between them — and even between chassis — in a seamless fashion. This enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their model to unprecedented sizes without any decrease in speed. Competing systems use oversubscribed, low-bandwidth PCIe busses for all communication which greatly limits their ability to improve performance by adding more hardware.
Nervana Systems also has Neon which they claim is the world’s fastest Deep Learning Framework.
Neon is an open source Python-based language and set of libraries for developing deep learning models
SOURCE- Nervana systems