Nvidia Vera Rubin Used by Google Could Next and Thinking Machines Lab

Nvidia Vera Rubin platform with the Rubin GPU and Vera CPU is a full-stack systems with CPO. It is the next-gen successor to Blackwell. The Vera CPUs are for agentic workloads. There is Groq LPU integration for inference disaggregation. there is projected $1T in combined Blackwell/Rubin orders through 2027. It will have 10x better inference perf/watt.

At Google Cloud Next, Google announced A5X powered by NVIDIA Vera Rubin NVL72 rack-scale systems, which — through extreme codesign across chips, systems and software — deliver up to 10x lower inference cost per token and 10x higher token throughput per megawatt than the prior generation.

A5X will use NVIDIA ConnectX-9 SuperNICs, combined with next-generation Google Virgo networking, scaling to up to 80,000 NVIDIA Rubin GPUs within a single site cluster and up to 960,000 NVIDIA Rubin GPUs in a multisite cluster, enabling customers to run their largest AI workloads on NVIDIA‑optimized infrastructure.

Leading frontier AI labs are already putting this infrastructure to work. Thinking Machines Lab is scaling its Tinker application programming interface (API) on A4X Max VMs with GB300 NVL72 systems to accelerate training, while OpenAI is running large‑scale inference on NVIDIA GB300 (A4X Max VMs) and GB200 NVL72 systems (A4X VMs) on Google Cloud for some of its most demanding inference workloads, including for ChatGPT.

Screenshot
Screenshot