Elon is saying SpaceX built its own super-optimized AI training software from scratch in the C programming language. It is 10 times faster than Google JAX framework.
It’s designed to run on a massive cluster of 220,000 cutting-edge NVIDIA GB300 GPUs connected by ultra-fast 800G networks. They use pipeline parallelism and get as close to raw hardware (bare metal) as possible to minimize overhead.
SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible.
The potential speed improvement vs JAX for large training runs is…
— Elon Musk (@elonmusk) May 28, 2026
This is a major departure from the higher-level frameworks (JAX, with some custom Rust components previously used at xAI) that most labs rely on.
What “over an order of magnitude” speedup vs JAX actually means
Elon stated the potential speed improvement for large training runs is over an order of magnitude (over 10× faster wall-clock time for equivalent work).
Why this is possible?
Standard frameworks like JAX (or PyTorch) have significant overhead at extreme scale. Python interpreter layers, generic abstractions, compiler passes (XLA), collective communication libraries that aren’t perfectly tuned to one specific cluster topology, and scaling inefficiencies beyond ~10k–50k GPUs. Typical Model FLOPS Utilization (MFU) on frontier clusters today is roughly 50–67% even in highly optimized setups.
What the new stack does differently
Written in pure C → eliminates interpreter and high-level runtime overhead.
Exact-maps to the exact 220k-GB300 + 800G NIC topology → every GPU, every link, every memory hierarchy is known at compile time. No generic runtime discovery or indirection.
Heavy pipeline parallelism (model layers split into pipeline stages across GPUs, with micro-batches flowing through like an assembly line) is hand-tuned to hide all communication latency behind computation.
Bare-metal kernels and custom collectives → direct hardware control, maximal bandwidth use on the 800G networking, and far higher sustained utilization (potentially 80%+ MFU or better at this scale).
How it applies to AI pretraining (and other training)
Pretraining is where it matters most: 80–95%+ of the compute for a new frontier LLM (like the next Grok foundation models) is spent in the initial pretraining phase on trillions of tokens. This stack is optimized exactly for that.
Other training (post-training, SFT, RL, supplemental/mid-training) will also benefit, but the gains are largest on the longest, most communication-heavy runs.
What it enables in practice
Train the same model >10× faster (what took 2–3 months now potentially in ~1 week).
Or train much larger models (10× more parameters or 10× more data) in the same calendar time.
Run many more parallel experiments simultaneously on the Colossus clusters.
xAI/SpaceXAI already trains multiple large models at once on Colossus 2 (and beyond). This stack will supercharge that flywheel.
Old vs. new timelines for new models
Old timelines (pre-this-stack, using previous JAX/custom frameworks on the rapidly expanding Colossus clusters)
Recent example: Grok V9-Medium (1.5T parameters) finished pretraining around May 25, 2026. full release expected in 2–3 weeks (fine-tuning + RL).
Earlier projections (early May 2026)
Grok 5 variants (up to 6–10T parameters) had pretraining completion targeted roughly ~2 months out (July 2026 timeframe), followed by post-training.
This could mean 10 Trillion parameter models get pre-trained in 1-2 weeks and have 2-3 weeks of fine tuning and reinforcement learning.

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
The results still suck, limits for SuperGrok users are ridiculously low now too. It is pretty much unusable at this stage. I hope the IPO tanks.