Longterm Wiki

ARC-AGI-2

Reasoning

Second iteration of the ARC benchmark with harder tasks, designed to remain challenging as AI capabilities improve.

Models Tested
1
Best Score
68.8%
Median Score
68.8%
Scoring: accuracy
Introduced: 2025-01
Maintainer: Francois Chollet / ARC Prize Foundation

Leaderboard1 models

#ModelDeveloperScore
🥇Claude Opus 4.6Anthropic68.8%