Nvidia’s Pascal is ten times faster than Maxwell processors

NVIDIA’s Pascal GPU architecture, set to debut in 2016, will accelerate deep learning applications 10X beyond the speed of its current-generation Maxwell processors. Beyond the Pascal is the Volta which will used stacked DRAM and achieve a two or three times boost.

Pascal GPUs will have three key design features that will result in dramatically faster, more accurate training of richer deep neural networks – the human cortex-like data structures that serve as the foundation of deep learning research.

Along with up to 32GB of memory — 2.7X more than the newly launched NVIDIA flagship, the GeForce GTX TITAN X — Pascal will feature mixed-precision computing. It will have 3D memory, resulting in up to 5X improvement in deep learning applications. And it will feature NVLink – NVIDIA’s high-speed interconnect, which links together two or more GPUs — that will lead to a total 10X improvement in deep learning.

Pascal is Nvidia’s follow-up to Maxwell, and the first desktop chip to use TSMC’s 16nmFF+ (FinFET+) process. This is the second-generation follow-up to TSMC’s first FinFET technology — the first generation is expected to be available this year, while FF+ won’t ship until sometime next year.

Mixed-Precision Computing – for Greater Accuracy

Mixed-precision computing enables Pascal architecture-based GPUs to compute at 16-bit floating point accuracy at twice the rate of 32-bit floating point accuracy.

Increased floating point performance particularly benefits classification and convolution – two key activities in deep learning – while achieving needed accuracy.

3D Memory – for Faster Communication Speed and Power Efficiency

Memory bandwidth constraints limit the speed at which data can be delivered to the GPU. The introduction of 3D memory will provide 3X the bandwidth and nearly 3X the frame buffer capacity of Maxwell. This will let developers build even larger neural networks and accelerate the bandwidth-intensive portions of deep learning training.

Pascal will have its memory chips stacked on top of each other, and placed adjacent to the GPU, rather than further down the processor boards. This reduces from inches to millimeters the distance that bits need to travel as they traverse from memory to GPU and back. The result is dramatically accelerated communication and improved power efficiency.

NVLink – for Faster Data Movement

The addition of NVLink to Pascal will let data move between GPUs and CPUs five to 12 times faster than they can with today’s current standard, PCI-Express. This is greatly benefits applications, such as deep learning, that have high inter-GPU communication needs.

NVLink allows for double the number of GPUs in a system to work together in deep learning computations. In addition, CPUs and GPUs can connect in new ways to enable more flexibility and energy efficiency in server design compared to PCI-E.

Pascal’s next improvement will be its use of HBM, or High Bandwidth Memory. Nvidia is claiming it will offer up to 32GB of RAM per GPU at 3x the memory bandwidth. That would put Pascal at close to 1TB of theoretical bandwidth depending on RAM clock — a huge leap forward for all GPUs.

Pascal should end up in the Autopilot and other car focused systems by 2017

The Tegra X1 processor, which powers its Drive CX cockpit visualisation computer, is capable of delivering one trillion floating-point operations per second.

There are 8 million cars on the road with Nvidia’s processors inside – including models from Telsa, Volkswagen, Honda and Mercedes as well as Audi – but Danny Shapiro, senior director of automotive at Nvidia, claims the company is just getting started. “We have contracts with a lot of automakers, so over the next several years we’re going to grow that number by over 25million,” he said.