xAI — Benchmark Score: 93.3
The source text explicitly states: 'We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition.' This directly confirms the claim that xAI's Grok 3 achieved a 93.3% benchmark score on AIME 2025 as of February 19, 2025 (the publication date of the source).
Our claim
entire record- Subject
- xAI
- Property
- Benchmark Score
- Value
- 93.3
- As Of
- 2025
- Source
- https://x.ai/news/grok-3
- Notes
- Grok 3 AIME 2025 benchmark: 93.3% success rate
Source evidence
1 src · 1 checkNoteThe source text explicitly states: 'We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition.' This directly confirms the claim that xAI's Grok 3 achieved a 93.3% benchmark score on AIME 2025 as of February 19, 2025 (the publication date of the source).