Skip to content
Longterm Wiki
benchmark

MMLU-Pro

Metadata

Source Tablebenchmarks
Source ID3PM0ZfVJxU
DescriptionA harder variant of MMLU with 10 answer choices (vs 4), chain-of-thought reasoning, and reduced sensitivity to prompt format. Designed to better discriminate among top models.
Source URLgithub.com/TIGER-AI-Lab/MMLU-Pro
Wiki IDmmlu-pro
Children
CreatedMar 14, 2026, 12:43 AM
UpdatedMar 24, 2026, 11:24 PM
SyncedMar 24, 2026, 11:24 PM

Record Data

id3PM0ZfVJxU
slugmmlu-pro
nameMMLU-Pro
categoryknowledge
descriptionA harder variant of MMLU with 10 answer choices (vs 4), chain-of-thought reasoning, and reduced sensitivity to prompt format. Designed to better discriminate among top models.
websitegithub.com/TIGER-AI-Lab/MMLU-Pro
scoringMethodaccuracy
higherIsBetterYes
introducedDate2024-06
maintainerTIGER Lab
sourcearxiv.org/abs/2406.01574
Debug info

Thing ID: 3PM0ZfVJxU

Source Table: benchmarks

Source ID: 3PM0ZfVJxU

Wiki ID: mmlu-pro