Skip to content
Longterm Wiki
benchmark-result

Claude Opus 4.1 on GPQA Diamond: 80.9

Child of GPQA Diamond

Metadata

Source Tablebenchmark_results
Source IDRsc7oluuId
ParentGPQA Diamond
Children
CreatedApr 24, 2026, 6:46 PM
UpdatedApr 24, 2026, 6:46 PM
SyncedApr 24, 2026, 6:46 PM

Record Data

idRsc7oluuId
benchmarkIdbdDmOTMoX8
modelIdClaude Opus 4.1(ai-model)
score80.9
unitpercent
date2025-08-05
sourceUrl
notesGraduate-level reasoning benchmark. Reported with extended thinking mode (up to 64K tokens).

Source Check Verdicts

confirmed95% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: Rsc7oluuId

Source Table: benchmark_results

Source ID: Rsc7oluuId

Parent Thing ID: bdDmOTMoX8