Tensors are Critical for AI Processing But What Are Tensors? TPUs?

Dan Fleisch briefly explains some vector and tensor concepts from A Student’s Guide to Vectors and Tensors. In the field of machine learning, tensors are used as representations for many applications, such as images or videos. They form the basis for TensorFlow’s machine learning framework. It is useful to understand Tensors, Tensorflow, and TPU (Tensor processing units).

Tensors are simply mathematical objects that can be used to describe physical properties, just like scalars and vectors. In fact tensors are merely a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor.

A tensor is something that holds values, some kind of table or array. A tensor has an order indicating on how many axis these values are arranged.

For example:

A tensor of order 0 is simply a single scalar number.
A tensor of order 1 is a vector. Each element is numbered by one index.
A tensor of order 2 is a matrix. Each element has two indices, e.g. row and column.

In the machine learning literature, a tensor is simply a synonym for multi-dimensional array.

TensorFlow includes eager execution where code is examined step by step making it easier to debug.

Tensor is generalized as an N-dimensional matrix.

CPU vs GPU vs TPU

The difference between CPU, GPU and TPU is that the CPU handles all the logics, calculations, and input/output of the computer, it is a general-purpose processor. In comparison, GPU is an additional processor to enhance the graphical interface and run high-end tasks. TPUs are powerful custom-built processors to run the project made on a specific framework, i.e. TensorFlow.

CPU: Central Processing Unit. Manage all the functions of a computer.
GPU: Graphical Processing Unit. Enhance the graphical performance of the computer.
TPU: Tensor Processing Unit. Custom build ASIC to accelerate TensorFlow projects.

What is TPU?
Tensor Processing Unit (TPU) is an application-specific integrated circuit, to accelerate the AI calculations and algorithm. Google develops it specifically for neural network machine learning for the TensorFlow software. Google owns TensorFlow software.

Google started using TPU in 2015; then, they made it public in 2018. You can have TPU as a cloud or smaller version of the chip.

TPUs are custom build processing units to work for a specific app framework. That is TensorFlow. An open-source machine learning platform, with state of the art tools, libraries, and community, so the user can quickly build and deploy ML apps.

Cloud TPU allows you to run your machine learning projects on TPU using TF. Designed for powerful performance, and flexibility, Google’s TPU helps researchers and developers to run models with high-level TensorFlow APIs.

The models who used to take weeks to train on GPU or any other hardware can put out in hours with TPU.

The TPU is 15x to 30x faster than current GPUs and CPUs on production AI applications that use neural network inference.

Is TPU better than GPU for machine learning?

A single GPU can process thousands of tasks at once, but GPUs are typically less efficient in the way they work with neural networks than a TPU. TPUs are more specialized for machine learning calculations and require more traffic to learn at first, but after that, they are more impactful with less power consumption.

How Do GPUs Work?
GPUs work via parallel computing, which is the ability to perform several tasks at once. It is also what makes them so valuable.

GPU parallel computing enables GPUs to break complex problems into thousands or millions of separate tasks and work them out all at once instead of one-by-one as a CPU is required to do.

GPU Pros and Cons
The parallel processing ability makes GPUs a versatile tool and great choice for a range of functions such as gaming, video editing, and cryptocurrency/blockchain mining.

GPUs are great for AI and machine learning (ML). ML is a form of data analysis that automates the construction of analytic models.

The modern GPU typically has between 2,500–5,000 arithmetic logic units (ALUs) in a single processor which enables it to potentially execute thousands of multiplications and additions simultaneously.

GPUs are designed as a general purpose processor that has to support millions of different applications and software. So while a GPU can run multiple functions at once, in order to do so, it must access registers or shared memory to read and store the intermediate calculation results.

And since the GPU performs tons of parallel calculations on its thousands of ALUs, it also expends large amounts of energy in order to access memory, which in turn increases the footprint of the GPU.

How Do TPUs Work?
Here’s how a TPU works:

TPU loads the parameter from memory into the matrix of multipliers and adders.
TPU loads the data from memory.
As multiplications are executed, their results are passed on to the next multipliers while simultaneously taking summation at the same time.
The output from these steps will be whatever the summation of all the multiplication results is between the data and parameters.

No memory access at all is required throughout the entire process of these massive calculations and data passing.

TPU Pros and Cons
TPUs are more expensive than GPUs and CPUs.

The TPU is 15x to 30x faster than current GPUs and CPUs on production AI applications that use neural network inference.

TPUs are a great choice for those who want to:

Accelerate machine learning applications
Scale applications quickly
Cost effectively manage machine learning workloads
Start with well-optimized, open source reference models

Tesla Dojo and its GPU Supercomputer

Tesla has been using a huge supercomputer powered by NVIDIA GPUs for processing its FSD data to build better models. This consists of 5,760 NVIDIA A100 graphics cards installed in 720 nodes of eight GPUs each. It’s capable of 1.8 exaflops of performance. It is amongst the fastest supercomputers in the world. One of the tasks this system performs is “autolabeling”, which adds labels to raw data so that it can become part of a decision-making system.

Tesla’s latest GPU-based supercomputing cluster has nearly 40 million GPU cores.

Tesla is building its own chips and systems into a system called Dojo. Dojo is different. Instead of combining lots of smaller chips, its D1 tile is one big chip with 354 cores specifically aimed at AI and ML. Six of these are then combined into a tray, alongside supportive computing hardware. Two of these trays can be installed in a single cabinet, giving each cabinet 4,248 cores, and a 10-cabinet exapod 42,480 cores. Dojo is specifically optimized for processing AI and ML, it is orders of magnitude faster than either CPU or GPUs for the same datacenter footprint.

5 thoughts on “Tensors are Critical for AI Processing But What Are Tensors? TPUs?”

  1. A tensor is just a multidimensional array of numbers, which you treat as being a vector or matrix (or higher-dimensional hyper-matrix).

    A TPU is mainly designed to multiply them. So it can multiply a vector times a matrix. Or multiply two matrices. Or multiply the dot product of two vectors. Or do equivalent multiplications of higher-dimensional tensors. They can also add these tensors, but you wouldn’t use a TPU just for adding.

    For neural networks, the most common operation is multiplying a vector times a matrix.

    For deep learning (convolutional neural networks), the same is true, except the vector and matrix may need to take their elements from inside a bigger matrix. Which is complicated in software on a CPU or GPU.

    BTW, in physics, such as general relativity, the word “tensor” means something slightly different. It’s roughly the combination of the tensor described above plus a coordinate system. So a 1D “vector” might be a list of numbers along with a coordinate system. And another vector might be a different list of numbers with a different coordinate system. Yet they count as being the same vector (they count as equal) if they represent the same arrow in space, just described in two different ways, with two different coordinate systems. The arrow is the actual vector, and the numbers are just a way to describe it. And if you can’t do all that, then it just counts as a pseudo vector rather than a true vector.

    But none of that is an issue in computer science, as implemented in a TPU. The TPU just treats any array of numbers as being a tensor. So any 1D array of numbers counts as being a vector.

  2. I’ve lost track of how many times I’ve forwarded/recommended Fleisch’s tensor video. If it’s not my most forwarded video clip of all time, it’s close. This is how it’s done.

  3. As I understand it in programming for example Python you have objects. Objects have properties and methods(functions). Now to save the data you use properties(if you use objects), arrays and lists, depends on what do you do,.. tensors are a way to store data. Like multiple bags with your items inside it. You put things inside and similar is with data in programming or perhaps tensors. Functions do something with that data. Like you look into bags, go through your items(loops) and take what you need out or similar operation. There are different ways how to store data. Tensors are a 3D way to store data and TPU-s are designed to work with 3D data structures (X,Y,Z axis) and use mathematical operations, functions that are the more appropriate for 3D data structures in machine learning.

    This is my understanding. Makes sense to make dedicated hardware for machine learning and not use ordinary GPU-s.

Comments are closed.