Google Gemini 2 AI model (just released) were trained with over 100,000 Trillium chips have been deployed in a single network fabric, enabling massive-scale AI operations. xAI has already trained Grok 3 with 100,000 Nvidia H100s but has not released it yet. xAI has added 100,000 chips and will train Grok 4 with 200,000 Nvidia H100s and H200s. Grok 4 will be released in April, 2025. Google and xAI are the leaders in AI compute with over 100,000 GPUs or TPUs used for model training. xAI is scaling to a million GPUs by the end of 2025. Google has that number of TPUs but they may not integrate them in one building or one coherent memory.
Google Gemini 2.0 was trained with Trillium, Google’s sixth-generation Tensor Processing Unit (TPU). This custom AI accelerator is now generally available to cloud customers, showcasing Google’s commitment to building an extensive computational infrastructure. Over 100,000 Trillium chips have been deployed in a single network fabric, enabling massive-scale AI operations.
Google has millions of TPU chips in multiple buildings and facilities. AI training requires all of chips to be in one network and sharing one memory. We will need to see how Google integrates its many TPU chips into one system for large AI model training.
Nvidia H100s had challenges scaling beyond 30,000 coherent chips for AI training clusters. Google has different chips with different networking capabilities.
Google is keeping pace with xAI in scaling to 100,000 roughly Nvidia H100 class GPU chips for its AI training cluster.
xAI is training Grok 3 with 100,000 Nvidia H100s (releasing January or February) and will train Grok 4 with 200,000 Nvidia H100s (releasing April/May).
xAI Grok 5 is training with 100,000 to 200,000 Nvidia B200s (releasing about August).
The Google AI Training campus shown above already has a power capacity close to 300MW (2024) and will ramp up to 500MW in 2025. Google on the other hand has already deployed millions of liquid cooled TPUs accounting for more than one Gigawatt (GW) of liquid cooled AI chip capacity.
In 2025, Google will have the ability to conduct Gigawatt-scale training runs across multiple campuses, but Google’s long-term plans aren’t nearly as aggressive as xAI, OpenAI and Microsoft.

Semi-analysis provides information on the Google AI training centers.
OpenaI and Microsoft were scaling with multiple buildings and facilties.
xAI made a breakthrough in scaling which seems to allow them to scale to millions of GPUs and beyond in one facility.
Nvidia H100 a Bit Better than TPU V6 and B200 is 4X Better
TPU v6: Offers 918 TFLOPs for BF16 and 1836 TOPs for INT8 per chip
.
H100: Provides approximately 1000 TFLOPs for FP16/BF16 and 2000 TOPs for INT8
B200
Offers up to 4x higher inference performance than the H100 in generative AI tasks, such as Llama 2 70B inference, using FP4 precision, which doubles throughput compared to FP8 on the H100
Achieves 2.2x higher training performance than the H100 in tasks like fine-tuning Llama 2 and pre-training GPT-37
.
FP8 Tensor Core performance reaches 9 PFLOPS per GPU, with a total of 72 PFLOPS for an 8-GPU system


Google’s TPU (Tensor Processing Unit) has evolved significantly from version 4 to version 6, with substantial improvements in performance, memory, and efficiency. Let’s compare these two generations:
Performance
TPU v6 (codenamed Trillium) offers a dramatic increase in computational power compared to TPU v4:
Peak Compute Performance: TPU v6 provides 918 TFLOPs for BF16 and 1836 TOPs for INT8 per chip
In contrast, TPU v4 offered 275 TFLOPs for both BF16 and INT8
Overall Improvement: TPU v6 achieves a 4.7x increase in peak compute performance per chip compared to TPU v5e
While not directly compared to v4, this suggests a significant leap from the v4 generation.
Memory
Memory capacity and bandwidth have been substantially upgraded in TPU v6:
HBM Capacity: TPU v6 features 32 GB of High Bandwidth Memory (HBM) per chip
, doubling the 16 GB available in TPU v4
Memory Bandwidth: TPU v6 boasts 1640 GBps of HBM bandwidth up from 1200 GBps in TPU v4
Interconnect
The inter-chip communication has been enhanced:
Interconnect Bandwidth: TPU v6 offers 3584 Gbps of Inter-chip Interconnect (ICI) bandwidth , more than doubling the capabilities of previous generations.
Energy Efficiency
TPU v6 demonstrates significant improvements in energy efficiency:
TPU v6 is over 67% more energy-efficient than TPU v5
While not directly compared to v4, this suggests a substantial improvement in power efficiency over earlier generations.
System Architecture
Both generations support large-scale deployments, but with different configurations:
TPU v6: Supports pods of up to 256 chips
TPU v4: Supported larger pods of up to 4096 chips
Application Focus
TPU v6 seems to have a broader focus on various AI workloads:
It is optimized for transformer models, text-to-image applications, and convolutional neural networks (CNNs)
It includes the third-generation SparseCore, specialized for processing large embeddings in ranking and recommendation systems
TPU v6 represents a significant leap forward from TPU v4, offering substantially higher performance, increased memory capacity and bandwidth, improved energy efficiency, and specialized capabilities for a wider range of AI workloads. While TPU v4 supported larger pod sizes, TPU v6 compensates with its dramatically increased per-chip performance and efficiency.

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
Was there a news grok 3 has already finished training or it’s just a guess?