ARC-AGI-2
ReasoningSecond iteration of the ARC benchmark with harder tasks, designed to remain challenging as AI capabilities improve.
Models Tested
1
Best Score
68.8%
Median Score
68.8%
Scoring: accuracy
Introduced: 2025-01
Maintainer: Francois Chollet / ARC Prize Foundation
Leaderboard1 models
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Claude Opus 4.6 | Anthropic | 68.8% |