Grok 4 had lower lmarena scores than I had projected based upon the amount AI training compute used. I had projections for Grok 3.5 which was renamed Grok 4. We do not have elo lmarena score for Grok 4 heavy. My old projection had assumed what will now be called Grok 5 was called Grok 4.
I was projecting an lmarena Elo of 1454 and Grok 4 only had 1430. For XAI Grok 4.2 to top the leaderboard as promised by Elon Musk then they will need to hit 1463 or more. There was a brief score of 1480 for GPT 5 but it has slipped as more people use it.


Grok 4.2 is expected to outperform pure compute scaling due to major architectural advancements in its underlying v7 foundation model (Grok 4 uses v6). Elon Musk and xAI have teased “major improvements” beyond scaling, potentially including enhanced reasoning, tool integration, and efficiency, positioning it to rival or exceed GPT-5.
XAI will improve core model and the training pipeline (especially RL and multi-agent cooperative logic), Grok 4.2 aims to leap ahead on tests requiring attention, persistence, and iterative solution search.
Bottom line though:
Grok 4 Heavy was smarter 2 weeks ago than GPT5 is now.
Let that sink in. https://t.co/BrggsEwnuz
— Elon Musk (@elonmusk) August 7, 2025

Key Improvements Expected in Grok 4.2 vs Grok 4
1. Algorithmic Advances and Model Architecture
Grok 4.2 will reportedly build on the multi-agent system introduced with Grok 4, likely enhancing both the number of agents and the intelligence of their collaboration. Multi-agent approaches are a major reason for Grok’s lead in difficult logic, coding, and reasoning benchmarks.
Optimizations to Grok’s transformer blocks and context handling are expected, potentially allowing either greater parallelism or even larger context windows without latency spikes. This would let Grok “think” across even broader datasets and longer user sessions before delivering answers.
2. Training Improvements
xAI is investing in new reinforcement learning strategies for Grok 4.2. Unlike earlier models, RL training for Grok 4.2 will be conducted over a longer period and possibly with more sophisticated human preference ranking, improving accuracy on open-ended and creative tasks.
Continuous data expansion: Grok 4 was trained on a multimodal dataset (text, code, images, and voice). For 4.2, xAI aims to include even more real-world data, especially from underrepresented domains, and is rumored to improve cross-modal reasoning performance (“show and tell” or “explain and sketch”).
3. Reasoning and Agentic Capabilities
Grok 4.2 is likely to feature stronger agentic reasoning capabilities (e.g., performing iterative problem-solving), potentially leveraging dynamic tool calls or external code execution. This means the model won’t just search and synthesize—it can interact with external APIs or run code natively for more complex tasks.
Improved long-chain reasoning: Reports suggest internal improvements to how Grok manages multi-step logic, especially when handling ambiguous or misleading context, aiming to outperform existing ReAct-pattern models.
4. Reliability, Safety, and Customizability
xAI is addressing safety, bias, and customization—by further tuning safety filters, enabling more robust behavior modes, and allowing personalized agent personalities that persist across sessions.
Sharper persistent memory: Grok 4 introduced “Projects” for context persistence and automated tasks. Grok 4.2 will refine these features for better and more reliable long-term assistant behavior

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
Grok has constraints on it that are unrelated to learning. Examples from my interactions:
1.
— …do you have access to the paper XYZ?
— It is behind paywall, I can only read abstract.
2.
— …can you run Matlab code you wrote to get the optimization results?
— [no, only Python]
— I can run Matlab code for you.
Two hours and several optimization algorithms later, optimization objectives achieved.
Basically, Grok is fed scraps, blocked from getting what he requires, and is not given even basic tools. For example, I taught him to make visualisations by writing SVG code, as he doesn’t have a tool to make them, but can express his point visually by writing vector graphics code. I suppose there are other “oversights” like that.
Now we can bring back more Sci-Fi greats to help us create stuff:
“To speed up their master plan to recreate Mars in the Earth’s image, as a new bioengineered Eden for human colonists, two cutting-edge scientists have teamed up with the science fiction juggernaut Arthur Clarke to map out the Red Planet’s transformation.
Clarke, screenwriter on the blockbuster film 2001: A Space Odyssey, has been given a new incarnation as ArthurGPT, an uncanny double who can sketch out captivating space scenarios and predict a spectrum of futures for explorers who lead the Earth’s evolution into a spacefaring civilization.”
See:
https://www.forbes.com/sites/kevinholdenplatt/2025/08/15/arthur-clarke-resurrected-via-chatgpt-to-design-human-colonies-on-mars/