Also known as: CAIS
The Center for AI Safety (CAIS) is a nonprofit organization that works to reduce societal-scale risks from AI. CAIS combines research, field-building, and public communication to advance AI safety. Co-founded by Dan Hendrycks (Executive Director) and Oliver Zhang (Managing Director) in 2022.
Key Metrics
Revenue (ARR)
Funding Rounds
Facts
15Other Data
| Title | Date | EventType | Description | Significance | |
|---|---|---|---|---|---|
| Reported revenue of $10.2M (FY2024) | 2024 | milestone | Cumulative funding reaches ~$33M since founding ($6.7M in 2022, $16.1M in 2023, $10.2M in 2024). | moderate | |
| Statement on AI Risk released | 2023-05 | milestone | One-sentence statement on AI extinction risk attracted signatures from over 350 AI researchers and industry figures, including Turing Award recipients (Hinton, Bengio, Russell) and CEOs of major AI labs (Altman, Amodei, Hassabis). | major | |
| MACHIAVELLI benchmark released | 2023 | publication | Benchmark for evaluating goal-directed and deceptive behavior in AI systems. | moderate | |
| Representation Engineering paper published | 2023 | publication | Methods for reading and steering model internal representations. | major | |
| "Unsolved Problems in ML Safety" published | 2022 | publication | Taxonomy of open technical challenges in machine learning safety, intended partly as a research agenda for the field. | major | |
| Founded by Dan Hendrycks and Oliver Zhang | 2022 | founding | Nonprofit research organization (EIN 88-1751310) focused on technical AI safety research, field-building, and public communication. | major |
| Title | PublicationType | Authors | Url | PublishedDate | IsFlagship | |
|---|---|---|---|---|---|---|
| Humanity's Last Exam | paper | Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li et al. | arxiv.org | 2025-01 | ✓ | |
| Introduction to AI Safety, Ethics, and Society | book | Dan Hendrycks | aisafetybook.com | 2024-06 | ✓ | |
| The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | paper | Nathaniel Li, Alexander Pan, Anjali Gopal et al. | wmdp.ai | 2024 | ✓ | |
| Superintelligence Strategy | report | Dan Hendrycks, Eric Schmidt, Alexandr Wang | nationalsecurity.ai | 2024 | ✓ | |
| Improving Alignment and Robustness with Circuit Breakers | paper | Andy Zou, Long Phan, Justin Wang et al. | arxiv.org | 2024 | — | |
| HarmBench: A Standardized Evaluation Framework for Automated Red Teaming | paper | Mantas Mazeika, Long Phan, Xuwang Yin et al. | harmbench.org | 2024 | ✓ | |
| Representation Engineering: A Top-Down Approach to AI Transparency | paper | Andy Zou, Long Phan, Sarah Chen et al. | arxiv.org | 2023-10 | ✓ | |
| An Overview of Catastrophic AI Risks | paper | Dan Hendrycks, Mantas Mazeika, Thomas Woodside | arxiv.org | 2023-06 | — | |
| Statement on AI Risk | policy-brief | CAIS | aistatement.com | 2023-05 | ✓ | |
| Universal and Transferable Adversarial Attacks on Aligned Language Models | paper | Andy Zou, Zifan Wang, Nicholas Carlini et al. | llm-attacks.org | 2023 | ✓ | |
| Unsolved Problems in ML Safety | paper | Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt | arxiv.org | 2021-09 | ✓ | |
| Measuring Massive Multitask Language Understanding (MMLU) | paper | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt | arxiv.org | 2020-09 | ✓ |
Divisions
6501(c)(4) advocacy arm. DC-based. Co-sponsored SB 1047. Lobbying ~$490K/yr.
80 A100 GPUs. 150+ researchers. ~100 safety papers, 16,000+ citations. Free access. Schmidt Sciences partnership.
Provides free compute access to academic AI safety researchers. One of the largest non-industry compute resources available for safety research.
Programs to grow the AI safety research community, including the Statement on AI Risk signed by hundreds of researchers and the ML Safety course.
Technical AI safety research on robustness, interpretability, and alignment. Led by Dan Hendrycks.
Related Wiki Pages
Top Related Pages
Representation Engineering
A top-down approach to understanding and controlling AI behavior by reading and modifying concept-level representations in neural networks, enablin...
Power-Seeking AI
Formal theoretical analysis demonstrates why optimal AI policies tend to acquire power (resources, influence, capabilities) as an instrumental goal.
Existential Risk from AI
Hypotheses concerning risks from advanced AI systems that some researchers believe could result in human extinction or permanent global catastrophe...
Dan Hendrycks
Director of CAIS, focuses on catastrophic AI risk reduction through research, education, and policy advocacy
Pause Advocacy
Advocacy for slowing or halting frontier AI development until adequate safety measures are in place. Analysis suggests 15-40% probability of meanin...