benchmark
MMLU
Metadata
| Source Table | benchmarks |
| Source ID | izV3Xk98se |
| Description | Massive Multitask Language Understanding — a multiple-choice benchmark covering 57 academic subjects from STEM to humanities. |
| Source URL | github.com/hendrycks/test |
| Wiki ID | mmlu |
| Children | — |
| Created | Mar 14, 2026, 12:43 AM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | izV3Xk98se |
slug | mmlu |
name | MMLU |
category | knowledge |
description | Massive Multitask Language Understanding — a multiple-choice benchmark covering 57 academic subjects from STEM to humanities. |
website | github.com/hendrycks/test |
scoringMethod | accuracy |
higherIsBetter | Yes |
introducedDate | 2021-01 |
maintainer | Dan Hendrycks et al. |
source | arxiv.org/abs/2009.03300 |
Debug info
Thing ID: izV3Xk98se
Source Table: benchmarks
Source ID: izV3Xk98se
Wiki ID: mmlu