XAI Grok 4 ranks third on the LMarena leaderboard. Google Gemini and OpenAI O3 rank ahead of Grok 4.
Grok-4 was tested with real-world prompts across domains like coding, math, as well as creative writing.
It ranks Top-3 across the board:
➗ Math: #1
💻 Coding: #2
✍️ Creative Writing: #2
📋 Instruction Following: #2
🧠 Hard Prompts: #3

They also tested Grok-4 with a system prompt from xAI.
Results:
– grok-4-0709 (no system prompt): 1433 (+9/-10)
– grok-4-0709 (system prompt): 1422 (+7/-8)
They deprecated the system-prompt version.
The grok-4-0709 on Arena is served without the system prompt.
XAI Grok will need the coding version to move up the coding ranks and will need the multimodal version to move up the Vision and other categories.


Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
The fact that Gemini 2.5 Flash is #6 considering just how fast it is (it’s seriously fast) and how cheap it is is very impressive.
Scaling up GPUs and raw compute did not give so much return as one would think. I compared grok 4 with o3 and 2.5 gemini and grok 4 gave worse answers. Still it is good product and competitive. In more GPU intensive think mode it gives more out.