Metadata
| Source Table | benchmark_results |
| Source ID | Rsc7oluuId |
| Parent | GPQA Diamond |
| Children | — |
| Created | Apr 24, 2026, 6:46 PM |
| Updated | Apr 24, 2026, 6:46 PM |
| Synced | Apr 24, 2026, 6:46 PM |
Record Data
id | Rsc7oluuId |
benchmarkId | bdDmOTMoX8 |
modelId | Claude Opus 4.1(ai-model) |
score | 80.9 |
unit | percent |
date | 2025-08-05 |
sourceUrl | — |
notes | Graduate-level reasoning benchmark. Reported with extended thinking mode (up to 64K tokens). |
Source Check Verdicts
confirmed95% confidence
Last checked: 4/24/2026
Inline sourcing: confirmed
Debug info
Thing ID: Rsc7oluuId
Source Table: benchmark_results
Source ID: Rsc7oluuId
Parent Thing ID: bdDmOTMoX8