ARC-AGI-2

Reasoning

Second iteration of the ARC benchmark with harder tasks, designed to remain challenging as AI capabilities improve.

Models Tested

Best Score

77.1

Median Score

72.95

Scoring: accuracy

Introduced: 2025-01

Maintainer: Francois Chollet / ARC Prize Foundation

Leaderboard (2 models)

#	Model	Developer	Score
🥇	Gemini	Google DeepMind	77.1
🥈	Claude Opus 4.6	Anthropic	68.8%