Metadata
| Source Table | benchmark_results |
| Source ID | qW2mPnLkRs |
| Parent | HellaSwag |
| Children | — |
| Created | Apr 24, 2026, 7:05 PM |
| Updated | Apr 24, 2026, 7:05 PM |
| Synced | Apr 24, 2026, 7:05 PM |
Record Data
id | qW2mPnLkRs |
benchmarkId | nD2CFoyeBf |
modelId | GPT(ai-model) |
score | 95 |
unit | percent |
date | 2026-02-01 |
sourceUrl | — |
notes | Frontier models have saturated HellaSwag at 95%+ |
Source Check Verdicts
confirmed95% confidence
Last checked: 4/24/2026
Inline sourcing: confirmed
Debug info
Thing ID: qW2mPnLkRs
Source Table: benchmark_results
Source ID: qW2mPnLkRs
Parent Thing ID: nD2CFoyeBf