Skip to content
Longterm Wiki
benchmark-result

GPT on HellaSwag: 95

Child of HellaSwag

Metadata

Source Tablebenchmark_results
Source IDqW2mPnLkRs
ParentHellaSwag
Children
CreatedApr 24, 2026, 7:05 PM
UpdatedApr 24, 2026, 7:05 PM
SyncedApr 24, 2026, 7:05 PM

Record Data

idqW2mPnLkRs
benchmarkIdnD2CFoyeBf
modelIdGPT(ai-model)
score95
unitpercent
date2026-02-01
sourceUrl
notesFrontier models have saturated HellaSwag at 95%+

Source Check Verdicts

confirmed95% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: qW2mPnLkRs

Source Table: benchmark_results

Source ID: qW2mPnLkRs

Parent Thing ID: nD2CFoyeBf