Google will make 1,000 Cloud TPUs (44 petaFLops) available at no cost to ML developers

< Google announced that their second-generation Tensor Processing Units (TPUs) are coming to Google Cloud to accelerate a wide range of machine learning workloads, including both training and inference. We call them Cloud TPUs, and they will initially be available via Google Compute Engine.

Developers can program these TPUs with TensorFlow, the most popular open-source machine learning framework on GitHub, and we’re introducing high-level APIs, which will make it easier to train machine learning models on CPUs, GPUs or Cloud TPUs with only minimal code changes.

They can be connected to virtual machines of all shapes and sizes and mix and match them with other types of hardware, including Skylake CPUs and NVIDIA GPUs.

While Cloud TPUs will benefit many Machine Learning applications, we remain committed to offering a wide range of hardware on Google Cloud so you can choose the accelerators that best fit your particular use case at any given time. For example, Shazam recently announced that they successfully migrated major portions of their music recognition workloads to NVIDIA GPUs on Google Cloud and saved money while gaining flexibility.

To help as many researchers as Google can and further accelerate the pace of open machine learning research, Google will make 1,000 Cloud TPUs (44 petaFLops) available at no cost to ML researchers via the TensorFlow Research Cloud.

Google has designed and deployed a second generation of its TensorFlow Processor Unit (TPU) and is giving access to the machine-learning ASIC as a cloud service for commercial customers and researchers. A server with four of the so-called Cloud TPUs delivers 180 TFlops that will be used both for training and inference tasks.

* 24 second generation TPUs would deliver over 1 petaFlops
* 256 second generation TPUs in a cluster can deliver 11.5 petaFlops

The effort aims to harness rising interest in machine learning to drive use of Google’s cloud services. It also aims to rally more users around its open-source TensorFlow framework, the only software interface that the new chip supports.

The Cloud TPU supports floating-point math, which Google encourages for both training and inference jobs to simplify deployment. The first-gen ASIC used quantized integer math.

Google is packing four of the new chips on a customer accelerator board. It packs at least 64 of them on a two-dimensional torus network in a cluster called a pod that’s capable of up to 11.5 petaflops. The initial chip rode a PCI Express card in an x86 server and was focused solely on inference jobs.

Google is not providing details of what’s inside the new ASICs and systems or their performance. But the details that it did mention and a look at pictures that it released raises some interesting questions.

The chip’s ability to run training as well as inference required the move to floating-point. But that also likely drives power consumption up to at least twice the 40 W of the initial TPU. The new ASICs sport huge fans and heat sinks, suggesting that Google is pushing thermals to the limit.

To double performance, Google may have simply increased the number of multiply-accumulate blocks and cache used in the original chip. However, the first-gen ASIC already packed a 24-Mbyte cache, about as much as many Intel server CPUs. It’s possible that under some of its heat sinks, Google is using HBM memory stacks.