benchmark
MMLU-Pro
Metadata
| Source Table | benchmarks |
| Source ID | 3PM0ZfVJxU |
| Description | A harder variant of MMLU with 10 answer choices (vs 4), chain-of-thought reasoning, and reduced sensitivity to prompt format. Designed to better discriminate among top models. |
| Source URL | github.com/TIGER-AI-Lab/MMLU-Pro |
| Wiki ID | mmlu-pro |
| Children | — |
| Created | Mar 14, 2026, 12:43 AM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | 3PM0ZfVJxU |
slug | mmlu-pro |
name | MMLU-Pro |
category | knowledge |
description | A harder variant of MMLU with 10 answer choices (vs 4), chain-of-thought reasoning, and reduced sensitivity to prompt format. Designed to better discriminate among top models. |
website | github.com/TIGER-AI-Lab/MMLU-Pro |
scoringMethod | accuracy |
higherIsBetter | Yes |
introducedDate | 2024-06 |
maintainer | TIGER Lab |
source | arxiv.org/abs/2406.01574 |
Debug info
Thing ID: 3PM0ZfVJxU
Source Table: benchmarks
Source ID: 3PM0ZfVJxU
Wiki ID: mmlu-pro