XAI Grok 4 is Third on AI Leaderboard

XAI Grok 4 ranks third on the LMarena leaderboard. Google Gemini and OpenAI O3 rank ahead of Grok 4.

Grok-4 was tested with real-world prompts across domains like coding, math, as well as creative writing.

It ranks Top-3 across the board:

➗ Math: #1
💻 Coding: #2
✍️ Creative Writing: #2
📋 Instruction Following: #2
🧠 Hard Prompts: #3

They also tested Grok-4 with a system prompt from xAI.

Results:
– grok-4-0709 (no system prompt): 1433 (+9/-10)
– grok-4-0709 (system prompt): 1422 (+7/-8)

They deprecated the system-prompt version.

The grok-4-0709 on Arena is served without the system prompt.

XAI Grok will need the coding version to move up the coding ranks and will need the multimodal version to move up the Vision and other categories.

2 thoughts on “XAI Grok 4 is Third on AI Leaderboard”

  1. The fact that Gemini 2.5 Flash is #6 considering just how fast it is (it’s seriously fast) and how cheap it is is very impressive.

  2. Scaling up GPUs and raw compute did not give so much return as one would think. I compared grok 4 with o3 and 2.5 gemini and grok 4 gave worse answers. Still it is good product and competitive. In more GPU intensive think mode it gives more out.

Comments are closed.