Skip to content
Longterm Wiki
Index
Fact·f_mGXpFffUh7·Fact

Center for AI Safety (CAIS) — publication: Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects

Verdictpartial95%
1 check · 4/16/2026

The source confirms the benchmark covers 57 subjects and was created by Hendrycks et al. However, the claim attributes this publication to 'Center for AI Safety (CAIS)' as the publisher/creator. The source text shows the authors are affiliated with UC Berkeley, Columbia University, UChicago, and UIUC — NOT CAIS. While the paper may have been later associated with CAIS or the authors may have later joined CAIS, the source document itself does not identify CAIS as the publishing organization or creator. The claim's temporal marker 'as of 2021-01' is also slightly misaligned with the paper's September 2020 publication date (arXiv:2009.03300), though this is a minor discrepancy. The core facts about the benchmark (57 subjects, Hendrycks et al. authorship, widely-cited status) are confirmed, but the CAIS attribution is not supported by this source.

Our claim

entire record
Subject
Center for AI Safety (CAIS)
Value
Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects
As Of
January 2021
Notes
Created by Hendrycks et al.; became one of the most-cited AI benchmarks

Source evidence

1 src · 1 check
partial95%primaryHaiku 4.5 · 4/16/2026

NoteThe source confirms the benchmark covers 57 subjects and was created by Hendrycks et al. However, the claim attributes this publication to 'Center for AI Safety (CAIS)' as the publisher/creator. The source text shows the authors are affiliated with UC Berkeley, Columbia University, UChicago, and UIUC — NOT CAIS. While the paper may have been later associated with CAIS or the authors may have later joined CAIS, the source document itself does not identify CAIS as the publishing organization or creator. The claim's temporal marker 'as of 2021-01' is also slightly misaligned with the paper's September 2020 publication date (arXiv:2009.03300), though this is a minor discrepancy. The core facts about the benchmark (57 subjects, Hendrycks et al. authorship, widely-cited status) are confirmed, but the CAIS attribution is not supported by this source.

Case № f_mGXpFffUh7Filed 4/16/2026Confidence 95%