Skip to content
Longterm Wiki
Index
Fact·f_tHAA1W30dw·Fact

xAI — Benchmark Score: 93.3

Verdictconfirmed99%
1 check · 4/16/2026

The source text explicitly states: 'We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition.' This directly confirms the claim that xAI's Grok 3 achieved a 93.3% benchmark score on AIME 2025 as of February 19, 2025 (the publication date of the source).

Our claim

entire record
Subject
xAI
Property
Benchmark Score
Value
93.3
As Of
2025
Notes
Grok 3 AIME 2025 benchmark: 93.3% success rate

Source evidence

1 src · 1 check
confirmed99%primaryHaiku 4.5 · 4/16/2026

NoteThe source text explicitly states: 'We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition.' This directly confirms the claim that xAI's Grok 3 achieved a 93.3% benchmark score on AIME 2025 as of February 19, 2025 (the publication date of the source).

Case № f_tHAA1W30dwFiled 4/16/2026Confidence 99%