Skip to content
Longterm Wiki
benchmark

HellaSwag

Metadata

Source Tablebenchmarks
Source IDnD2CFoyeBf
DescriptionA commonsense natural language inference benchmark testing whether models can predict the most plausible continuation of a scenario. Uses adversarial filtering against LMs.
Wiki IDhellaswag
Children
CreatedMar 14, 2026, 12:43 AM
UpdatedMar 24, 2026, 11:24 PM
SyncedMar 24, 2026, 11:24 PM

Record Data

idnD2CFoyeBf
slughellaswag
nameHellaSwag
categoryreasoning
descriptionA commonsense natural language inference benchmark testing whether models can predict the most plausible continuation of a scenario. Uses adversarial filtering against LMs.
website
scoringMethodaccuracy
higherIsBetterYes
introducedDate2019-05
maintainerAI2 / University of Washington
sourcearxiv.org/abs/1905.07830
Debug info

Thing ID: nD2CFoyeBf

Source Table: benchmarks

Source ID: nD2CFoyeBf

Wiki ID: hellaswag