In LMArena, Grok4.1 (Thinking) and Grok4.1 ranks first. In the earlier benchmark tests, Grok4.1 (Thinking) ranked first with a score of 1510. Currently, it is still first but with a score of 1483. Grok 4.1 is second.
There is a massive reduction in hallucination. It drops from 12% to about 4%.
This version scored more than 40 points higher than the previously released Grok4Fast two months ago.
It was silently rolled out to a subset of users from November 1–14, 2025, and it was preferred over prior models in 64.78% of blind pairwise evaluations on live traffic.
It is the top released model for creative writing.
OpenAI has GPT 5.1 that has not been released yet, that could have a slightly higher creative writing score.
Google will soon release Gemini 3. It is expected to be very good.
XAI Grok 5 will be released in Q1 2026 and will have double the parameters and will be a major step up.
Grok 4.1 Specifications and Benchmarks
Grok 4.1 is positioned as a frontier model emphasizing conversational intelligence, emotional understanding, real-world helpfulness, and reduced errors. It’s available immediately to all users (including free tier) on grok.com, x.com (via login), and the Grok iOS/Android apps.
Model Variants
Grok 4.1 Thinking (code name quasarflux). It uses thinking tokens for step-by-step reasoning in responses.
Grok 4.1 Non-Reasoning (code name tensor). It has direct responses without thinking tokens for speed.
Grok 4.1 Fast (Non-Reasoning) Includes integrated search tools for quick, fact-checked answers.
1 thought on “XAI Releases Grok 4.1 and It Tops the LMArena Leaderboard”
At this point in time, the entire GROK echo system seems to be down.
No responses to prompts, all chat history lost.
Hopefully just a temporarily overload and not GROK 4.1 going full Terminator on us.