Center for AI Safety (CAIS) — publication: Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects
The source confirms the benchmark covers 57 subjects and was created by Hendrycks et al. However, the claim attributes this publication to 'Center for AI Safety (CAIS)' as the publisher/creator. The source text shows the authors are affiliated with UC Berkeley, Columbia University, UChicago, and UIUC — NOT CAIS. While the paper may have been later associated with CAIS or the authors may have later joined CAIS, the source document itself does not identify CAIS as the publishing organization or creator. The claim's temporal marker 'as of 2021-01' is also slightly misaligned with the paper's September 2020 publication date (arXiv:2009.03300), though this is a minor discrepancy. The core facts about the benchmark (57 subjects, Hendrycks et al. authorship, widely-cited status) are confirmed, but the CAIS attribution is not supported by this source.
Our claim
entire record- Subject
- Center for AI Safety (CAIS)
- Value
- Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects
- As Of
- January 2021
- Notes
- Created by Hendrycks et al.; became one of the most-cited AI benchmarks
Source evidence
1 src · 1 checkNoteThe source confirms the benchmark covers 57 subjects and was created by Hendrycks et al. However, the claim attributes this publication to 'Center for AI Safety (CAIS)' as the publisher/creator. The source text shows the authors are affiliated with UC Berkeley, Columbia University, UChicago, and UIUC — NOT CAIS. While the paper may have been later associated with CAIS or the authors may have later joined CAIS, the source document itself does not identify CAIS as the publishing organization or creator. The claim's temporal marker 'as of 2021-01' is also slightly misaligned with the paper's September 2020 publication date (arXiv:2009.03300), though this is a minor discrepancy. The core facts about the benchmark (57 subjects, Hendrycks et al. authorship, widely-cited status) are confirmed, but the CAIS attribution is not supported by this source.