US Military Shipping Container Supercomputer Has 6 Petaflop Performance

The Department of Defense (DoD)’s High Performance Computing Modernization Program (HPCMP) has a supercomputer in a shipping container with 6 PetaFLOPS of performance. It will be used for both training and inference workloads. It has 1.3 Petabytes of solid-state storage.

The US Army paid $12 million for the 6 petaflop supercomputer in a shipping container.

The shipping container supercomputer system has:

* 22 nodes for machine learning training workloads, each with two IBM Power9 processors, 512GB of system memory, 8 Nvidia V100 GPUs with 32GB of high-bandwidth memory, and 15TB of local solid-state storage
* 128 nodes for inferencing workloads, each with two IBM Power9 processors, 256GB of system memory, 4 Nvidia T4 GPUs with 16GB of high-bandwidth memory, and 4TB of local solid state storage
* Three solid-state parallel file systems, totaling 1.3 PB
* A 100 Gigabit per second InfiniBand network, as well as dual 10 gigabit Ethernet networks
* Platform LSF HPC job scheduling integrated with a Kubernetes container orchestration solution
* Integrated support for TensorFlow, PyTorch, Caffe, in addition to traditional HPC libraries and toolsets including FFTW and Dakota

The IBM IC922 has been released for commercial sales.

The DoD HPC Modernization Program (HPCMP) aims to have
* 100 petaflops system by 2025
* a cognitive production system in 2026,
* an exaflops system in 2031 and a
* 10 exaflops system and a quantum pilot in 2036. Iin 2040, it hopes to have a quantum production system.

In 2019, the Department of Defense (DoD) High Performance Computing Modernization Program (HPCMP) procured 12.8 petaFLOPs of system. This will increase the DoD HPCMP’s aggregate supercomputing capability to 53 petaFLOPs.

The system, the HPCMP’s first with greater than 10 PetaFLOPS of peak computational performance, will be installed at the Navy’s DSRC’s facility at Stennis Space Center, Mississippi and will serve users from all of the services and agencies of the Department.

The architecture of the system is as follows:

A Cray Shasta system with 290,304 AMD EPYC “Rome” compute cores and 112 NVIDIA Volta V100 General-Purpose Graphics Processing Units (GPGPUs), interconnected by a 200 gigabit per second Cray Slingshot network and supported by 1 PB of NVMe-based solid state storage, 590 terabytes of memory, and 14 petabytes of usable storage.

The system is expected to enter production service early in fiscal year 2021

8 thoughts on “US Military Shipping Container Supercomputer Has 6 Petaflop Performance”

  1. The running joke used to be that you had so many servers you could lose some in some dark corner, then you had so many server containers you could lose some, and apparently Googlers now joke about losing whole datacenters like pocket lint…

  2. So sounds like a couple of Ceph-RBD racks fronted by LIO, and Power9 is the same endian-ness as x86 (unlike previous PowerPC processors) so porting software to it is not that hard (as Google discovered).

    All SSD storage is a must, as Google and Microsoft discovered they couldn’t make a reliable container datacenter with spinning HDD’s unless they sacrificed a lot of space for shock dampers (longshoremen and truckers tend to be a little rough with their cargos…)

  3. We see that the need is for them to be cognitive by mid decade, possibly something to do with autonomous systems programming on the fly?

  4. 1.3 PB of Solid State Storage (SSD or NVMe) ain’t a lot these days. A single stamp (rack) of Azure NetApp Files is 1.5 PB.

  5. They wouldn’t necessarily be shipping it around. The cloud builders found great utility in using shipping containers to build their data centers. Build out times were accelerated while making each unit of compute self-contained. Once a certain percent of a container was malfunctioning, they simply swapped it out for a new one… plug-n-play.

Comments are closed.