Eight Nvidia A100 Next Generation Tensor Chips for 5 Petaflops at $200,000

The Nvidia A100 chip was presented at the Hot Chips conference. Sander Olson provided Nextbigfuture with the presentation.

The Nvidia A100 is a third-generation Tensor Core chip. It is faster and more efficient than competing chips like the prior Nvidia chip (V100), the Google Tensor processing unut (TPU) version 3, and Huawei Ascend. The Nvidia A100 even outperforms the unreleased Google Tensor Processing Unit Version 4 in most categories.

The A100 has comprehensive data types, sparsity acceleration, asynchronous data movement and synchronization and increased L1/SMEM capacity.

NVIDIA DGX A100 is the universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. NVIDIA DGX A100 features the world’s most advanced accelerator, the NVIDIA A100 Tensor Core GPU, enabling enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure that includes direct access to NVIDIA AI experts.

The Universal System for Every AI Workload NVIDIA

DGX A100 systems integrate eight of the new NVIDIA A100 Tensor Core GPUs, providing 320GB of memory for training the largest AI datasets, and the latest high-speed NVIDIA Mellanox® HDR 200Gbps interconnects.

Multiple smaller workloads can be accelerated by partitioning the DGX A100 into as many as 56 instances per system, using the A100 multi-instance GPU feature. Combining these capabilities enables enterprises to optimize computing power and resources on demand to accelerate diverse workloads, including data analytics, training and inference, on a single, fully integrated, software-defined platform.

DGX A100 is the universal system for all AI workloads—from analytics to training to inference. DGX A100 sets a new bar for compute density, packing 5 petaFLOPS of AI performance into a 6U form factor, replacing legacy compute infrastructure with a single, unified system. DGX A100 can deliver fine-grained allocation of computing power, using the Multi-Instance GPU capability in the NVIDIA A100 Tensor Core GPU. Administrators can assign resources that are right-sized for specific workloads. This ensures that large and small jobs can be supported. The combination of dense compute power and complete workload flexibility make DGX A100 an ideal choice for both single node deployments and large scale Slurm and Kubernetes clusters deployed with NVIDIA DeepOps.

NVIDIA DGX A100 is more than a server, it’s a complete hardware and software platform.

SOURCES – Nvidia, Hot Chips Conference
Written by Brian Wang, Nextbigfuture.com