Elon Musk confirmed to me, Brian Wang, that the current beta model is the ~500B parameter base model. Overall early consensus from testers, it beats or matches frontier models (GPT-5, Claude 4/Opus 4.5, Gemini 3) in practical coding, simulations, iterative work, and real-world agentic tasks. XAI Grok 4.20 will scale to 16 agents in Heavy mode.
Provisional LMSYS/Arena ELO ~1505–1535. XAI Grok 4.20 is projected to take #1 once fully ranked. Grok 4.1 Thinking was 1483). Heavy mode is expected to be +30 to +80 Elo on hard tasks. Realistic range for Heavy is ~1540–1610+

xai 4.20 lets you track about 200 active queries. I created my own dashboard and you can easily set it to run the updates for the dashboard at whatever schedule you want.
XAI Grok 4.20 made a good flight simulator and passed most of the tests far better than XAI Grok 4.1.
Rapid weekly learning — improves every week during beta with public release notes (first model to do this at scale).
Dramatically lower hallucinations via internal cross-validation.
Much faster inference + better multimodal (especially medical image/file analysis for second opinions).
Stronger open-ended engineering reasoning, iterative coding, simulations, and agentic tasks.

Unique edges are real-time X data, lower censorship, built-in team intelligence, weekly rapid improvements.
Still early beta — no full public benchmark suite yet, but hands-on and trading results are extremely strong.
This is the first model that genuinely feels like working with a small expert team instead of one smart assistant.
Wes Roth likes the 4 agent system. Completely different paradigm. Multi-agent collab beats single-model reasoning on hard tasks.
00:00 – Intro
01:00 – First Look
02:05 – Browser OS Test
07:32 – 3D Printer Simulation Test
09:27 – Romance Novel Creative Writing Test
14:13 – Wireframe to Website Test
15:28 – Anthropic Application Portfolio Test
17:22 – Flight Combat Simulator Test
19:07 – Python 3D FPS Test
20:03 – Subway Station Scene Test
20:47 – C++ Skateboard Game Test
22:54 – Research & Design Capability Test
25:10 – Model Impressions
26:06 – Results Overview
Excellent for business automation & coding; multi-agent feels like “having a full team.”
Jengo says it beat GPT-5 and Claude 4. Highlights Alpha Arena win and real-time X advantage

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.