Graphcore Chips Could Speed Up AI by 100 Times

Graphcore’s IPU (Intelligence Processing Unit) accelerators and Poplar software together make the fastest and most flexible platform for current and future machine intelligence applications, lowering the cost of AI in the cloud and datacenter, improving performance and efficiency by between 10x to 100x.

Graphcore recently closed a new $200 million funding round which values the company at $1.7 billion and brings the total capital raised to over $300 million. This round was jointly led by Atomico, Europe’s leading venture capital firm, BMW and Microsoft also joined this round as new strategic investors.

They are specifically designed for machine intelligence workloads. The IPU can be used in a wide variety of applications from intelligent voice assistants to self-driving vehicles

Graphcore systems are good at both training and inference. The highly parallel computational resources together with graph software tools and libraries, allows researchers to explore machine intelligence across a much broader front than current solutions. This technology lets recent success in deep learning to evolve rapidly towards useful, general artificial intelligence.

The IPU has over 14,000 Independent Processor Threads. It has 100 times the memory bandwidth.

* Graphs expose huge parallelism
* Sparsity changes
* Computer memory optimized for AI Memory Access Patterns
* Static structure of graph allows compiler to do more work
* Model values are low-resolution

Dell EMC Partnership With Graphcore

Dell EMC became a strategic investor in Graphcore in 2016. Graphcore has worked closely with teams across the Dell Technologies’ businesses. Dell has provided unique global scale, channel, OEM, product integration and go-to-market relationships to Graphcore Intelligence Processing Unit (IPU) products.

Early in 2018, Graphcore started shipping their first C2 IPU-Processor PCIe cards to early access customers.

The IPU-based platform contains 8 C2 IPU-Processor PCIe cards, each with 2 Colossus GC2 IPU-Processors. It delivers over 2 petaflops of machine intelligence compute, spread across over 100,000 independent parallel programs, working on the machine intelligence model held inside the IPUs, and with memory bandwidth of nearly 1 Petabyte/sec, for dramatically higher performance and energy efficiency.

IPU Pod Racks

Many of the new machine learning approaches need 100x or 1000x more computing power. This level of performance requires a new solution that can scale to the level of computing power required.

Graphcore networking team came up with a design for an IPU-Pod that is completely elastic and can scale out to support the massive levels of computing power.

A single 42U rack IPU-Pod delivers over 16 Petaflops of mixed precision compute and a system of 32 IPU-Pods scales to over 500 Petaflops of mixed precision computing power.

The Rackscale IPU-Pod has 32 1U IPU-Machines™. Each 1U server has 4 Colossus GC2 IPU Processors providing 500 TFlops of mixed precision computing power, over 1.2GB of In-Processor Memory™ and an unprecedented memory bandwidth of over 200TB/s.

A 32-Rack IPU-POD with a total of 4096 IPU processors, delivering 0.5 ExaFlop of mixed precision compute

SOURCES- Graphcore

Written By Christina Wong.