publication
Measuring Massive Multitask Language Understanding (MMLU)
Metadata
| Source Table | publications |
| Source ID | QNbZISNZtJ |
| Description | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt, 2020-09 |
| Source URL | arxiv.org/abs/2009.03300 |
| Parent | Center for AI Safety (CAIS) |
| Children | — |
| Created | Mar 23, 2026, 2:46 PM |
| Updated | Mar 23, 2026, 2:46 PM |
| Synced | Mar 23, 2026, 2:46 PM |
Record Data
id | QNbZISNZtJ |
entityId | Center for AI Safety (CAIS)(organization) |
entityDisplayName | — |
resourceId | — |
title | Measuring Massive Multitask Language Understanding (MMLU) |
authors | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt |
url | arxiv.org/abs/2009.03300 |
venue | — |
publishedDate | 2020-09 |
publicationType | paper |
citationCount | — |
isFlagship | Yes |
abstract | — |
source | arxiv.org/abs/2009.03300 |
notes | Most widely used AI capability benchmark. ICLR 2021. |
Source Check Verdicts
confirmed95% confidence
Last checked: 4/3/2026
All key fields in the record are confirmed by the source text. The title, all seven authors in the correct order, and the publication date (2020-09, corresponding to arxiv submission September 2020) are explicitly stated. The arxiv URL format matches the standard pattern for arxiv papers. The publication type as 'paper' is appropriate for an arxiv preprint. No contradictions or discrepancies were found.
Debug info
Thing ID: QNbZISNZtJ
Source Table: publications
Source ID: QNbZISNZtJ