Center for AI Safety (CAIS)

Safety Org

Center for AI Safety (CAIS)

Part of AI Safety Organizations (Overview)

CAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering and MACHIAVELLI benchmark papers, and organized the May 2023 Statement on AI Risk signed by over 350 AI researchers and industry leaders. The organization focuses on technical safety research, field-building, and policy communication.

EA Forum Wikidata

TypeSafety Org

Founded2022

LocationSan Francisco, CA

Websitesafe.ai

Concepts

Risks

Organizations

People

3k words · 48 backlinks

Overview

The Center for AI Safety (CAIS)↗ is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication. Founded by Dan Hendrycks, CAIS received substantial public attention in May 2023 when it organized a one-sentence statement on AI extinction risk that attracted signatures from over 350 AI researchers and industry figures, including several Turing Award recipients and heads of major AI laboratories.

CAIS operates across three areas: technical research on AI alignment and robustness, grant and fellowship programs intended to grow the AI safety research community, and communication efforts aimed at policymakers and the public. Its technical output includes work on Representation Engineering and the MACHIAVELLI benchmark for evaluating goal-directed behavior in AI systems. The organization has received substantial funding from EA-aligned sources including Coefficient Giving (formerly Open Philanthropy), a funding relationship that is relevant context for assessing its research priorities and institutional positioning.

CAIS occupies a distinct niche in the AI safety ecosystem: unlike academic centers such as CHAI or research-focused organizations like MIRI, it combines original technical research with explicit field-building and public communication goals. Critics have questioned whether its emphasis on long-run extinction risk is appropriately calibrated relative to near-term AI harms, and whether EA-concentrated funding in this space creates ideological homogeneity in safety research priorities. These debates are discussed in the Critiques and Limitations section below.

Organizational Background

CAIS was established as a nonprofit research organization with the goal of filling a perceived gap between technical AI safety research and broader scientific and public awareness of AI risks. Dan Hendrycks, who completed his PhD at UC Berkeley, co-founded CAIS with Oliver Zhang to provide infrastructure — compute grants, fellowships, educational resources, and policy engagement — that individual academic researchers lacked access to.

The organization's theory of change rests on several linked assumptions: that AI systems pose meaningful risks of societal-scale harm, including possible catastrophic outcomes; that the current period is important for establishing safety-relevant research norms and technical methods; and that field-building activities (funding researchers, running educational programs, facilitating policy engagement) will increase the probability of good outcomes by growing and coordinating the safety research community. Whether these assumptions are well-founded is contested, and the organization's critics have argued that the extinction-risk framing in particular overstates speculative long-run risks relative to observable near-term harms.

CAIS is legally structured as a nonprofit (EIN: 88-1751310). Its primary disclosed funders include Coefficient Giving and the Survival and Flourishing Fund. Per IRS Form 990 filings available on ProPublica, CAIS reported total revenue of $6.7M (2022), $16.1M (2023), and $10.2M (2024), for cumulative funding of approximately $33M since founding.

Funding

CAIS's primary disclosed funders have included Coefficient Giving (formerly Open Philanthropy), a philanthropic organization closely associated with the effective altruism movement. This funding relationship is material context for interpreting the organization's research agenda: Coefficient Giving has historically prioritized long-run catastrophic and extinction-level AI risk over near-term AI harms, and CAIS's framing broadly reflects this prioritization.

Per IRS Form 990 filings (ProPublica), CAIS reported total revenue of $6.7M (2022), $16.1M (2023), and $10.2M (2024). Major known grant sources include Open Philanthropy (≈$10.6M across 2022-2023 general support grants), SFF (≈$3.8M in 2024-2025), Good Ventures Foundation ($1.9M in 2024), and Founders Pledge ($0.9M in 2024). Total expenses were $7.2M in 2024, with total assets of $12.6M.

The concentration of AI safety funding through EA-aligned funders including Coefficient Giving (formerly Open Philanthropy) has been noted by critics as a potential source of ideological constraint on safety research priorities — organizations dependent on this funding may face implicit pressure to prioritize framings and research directions consistent with funder worldviews. CAIS has not publicly addressed this critique directly.

Key Research Areas

Technical Safety Research

Research Domain	Key Contributions	Notes
Representation Engineering	Methods for reading and steering model internal representations	Published 2023↗; independent replication and scalability to frontier models remains an open research question
Safety Benchmarks	MACHIAVELLI benchmark for evaluating goal-directed and deceptive behavior	Cited in subsequent research; the extent to which it has been formally integrated into evaluation pipelines at Anthropic or OpenAI is not publicly documented
Adversarial Robustness	Evaluation protocols and defense mechanisms	Part of the broader Adversarial Robustness research agenda
Alignment Foundations	Conceptual frameworks and problem taxonomies for AI safety	Including the "Unsolved Problems in ML Safety" paper (2022)

Major Publications & Tools

Representation Engineering: A Top-Down Approach to AI Transparency↗ (2023) — Methods for understanding and influencing AI decision-making by working with internal representations rather than input-output behavior alone
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior↗ (2023) — Introduces the MACHIAVELLI benchmark for evaluating whether AI agents pursue goals through unethical means in text-based game environments
Unsolved Problems in ML Safety↗ (2022) — A taxonomy of open technical challenges in machine learning safety, intended partly as a research agenda for the field
Measuring Mathematical Problem Solving With the MATH Dataset↗ (2021) — A benchmark for evaluating AI mathematical reasoning, authored by Dan Hendrycks and collaborators during his PhD at UC Berkeley; this paper predates CAIS's founding and is a product of Hendrycks's academic research rather than an organizational output of CAIS

Citation counts for these papers (figures such as "200+", "50+", "30+") previously appeared on this page without sourced methodology. Readers seeking current citation data should consult Google Scholar or Semantic Scholar directly.

Field-Building Programs

CAIS runs several programs intended to grow the population of researchers working on AI safety. The term "field-building" refers to activities designed to increase the size, diversity, and coordination of a research community — in this case, researchers focused on technical and governance aspects of AI safety.

Grant Programs

Program	Reported Scale	Description	Timeline
Compute Grants	$2M+ distributed; number of recipients reported variously as 100+ and 200+ in different CAIS materials — figure unverified	Provides compute resources to researchers working on safety-relevant projects	2022–present
ML Safety Scholars	63 graduates in the Summer 2022 cohort	Structured program for early-career researchers entering the AI safety field	2021–present (pre-dates CAIS's 2022 founding; originated as an independent initiative)
Research Fellowships	Amount not publicly disclosed	Fellowships placing researchers at academic and research institutions	2022–present
AI Safety Camp	Participant count not publicly disclosed	Collaborative program supporting international research teams	2020–present (pre-dates CAIS's 2022 founding; originated as an independent initiative)

Note: Quantitative figures in this table are drawn from CAIS's own communications and have not been independently verified. The ML Safety Scholars program was introduced in 2021 as an initiative led by Dan Hendrycks and collaborators during his time at UC Berkeley, and was later absorbed into CAIS's organizational umbrella.

Institutional Partnerships

Academic Collaborations: CAIS’s compute cluster supports researchers from UC Berkeley, Stanford, University of Cambridge, ETH Zurich, and other institutions. Collaborative research has included work with Carnegie Mellon University on adversarial attacks on large language models.
Industry Engagement: Research interactions with Anthropic and Google DeepMind have been reported in CAIS communications, though specific partnership details are not publicly documented.
Policy Connections: CAIS’s Action Fund engages in AI policy advocacy, including sponsoring California SB 1047. Specific briefings with individual legislative bodies are not independently documented.

Statement on AI Risk (2023)

In May 2023, CAIS published and circulated the Statement on AI Risk↗, a single sentence co-signed by over 350 AI researchers and industry figures:

"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

The statement was covered widely in major news outlets and was cited in subsequent policy discussions, including in the context of UK and US government AI strategies. The official signatory list is available at safe.ai; the figure of 350+ is drawn from that list, though the precise count at any given time may vary as signatories are added.

Signatory Groups

Category	Notable Signatories	Description
Turing Award Recipients	Geoffrey Hinton, Yoshua Bengio, Stuart Russell	Recipients of computing's highest recognition who signed the statement
Industry Executives	Sam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (DeepMind)	CEOs of major AI laboratories
Policy and Governance Researchers	Helen Toner, Allan Dafoe, Gillian Hadfield	Researchers working on AI governance and policy
ML/AI Researchers	300+ researchers across academia and industry	Researchers who signed as individuals, not representing institutional positions

The statement's reception was not uniformly positive within the AI research community. A number of prominent ML researchers declined to sign or publicly criticized the statement's framing. Critics raised several concerns: that the one-sentence format was too vague to convey meaningful technical content; that equating AI risk with nuclear war risk was unsupported by available evidence; that the extinction framing could distract attention and resources from observable near-term harms from AI systems (such as bias, surveillance, and labor displacement); and that the statement's signatories were not uniformly working on extinction-risk problems, making it a weak signal of scientific consensus. Timnit Gebru criticized it for elevating speculative extinction risks while being promoted by "the same people who have poured billions of dollars into these companies." Human Rights Watch argued that scientists should focus on the known risks of AI instead of speculative future dangers. Emile Torres and Gebru argued the statement may be motivated by TESCREAL ideologies.

Proponents argued that the statement served a legitimate coordination function: making it socially acceptable for researchers to discuss catastrophic risk publicly, signaling to policymakers that risk concerns were not fringe views, and creating a reference point for subsequent regulatory discussions. Whether the statement's net effect on AI policy and research prioritization was positive is a matter of ongoing debate.

The statement's impact on specific policy documents — including mentions in UK AI Safety Institute and US AI Safety Institute contexts — has been cited by CAIS, though the causal relationship between the statement and any particular policy outcome is difficult to establish.

Critiques and Limitations

Criticism of Extinction-Risk Framing

The most substantive criticism of CAIS's work concerns its central framing of AI extinction risk as a near-term policy priority. Critics from several directions have argued:

Near-term displacement effect: Emphasizing speculative long-run extinction risk may draw funding, talent, and policy attention away from near-term AI harms — discrimination in algorithmic decision-making, AI-enabled surveillance, labor market disruption, and misinformation — that are currently affecting people. Researchers associated with the AI ethics and fairness communities, including Timnit Gebru and the DAIR Institute, have made this argument most consistently.
Epistemic status of extinction claims: The probability of AI-caused human extinction within policy-relevant timeframes is highly uncertain, and critics have argued that treating it as a "global priority alongside pandemics and nuclear war" involves large unjustified inferential steps. Some ML researchers have noted that the mechanisms by which current or near-term AI systems could pose extinction-level risks are not specified with sufficient precision to evaluate.
Ideological concentration: CAIS's alignment with EA-associated funders and the broader longtermist intellectual tradition has led critics to argue that its research agenda reflects a particular philosophical worldview rather than a neutral assessment of AI risk. This critique is not unique to CAIS — it applies to several EA-funded AI safety organizations — but it is relevant to assessing how to interpret CAIS's outputs.

Limitations of Specific Research

Representation Engineering scalability: The representation engineering paper introduced methods that work on models of a given scale; whether these methods generalize to frontier-scale models is an open question. A survey of representation engineering identifies challenges including performance degradation at scale, computational overhead, and reliability concerns regarding whether the correlations identified are causal.
Benchmark validity: A general concern in AI safety evaluation is whether constructed benchmarks (including MACHIAVELLI) capture risks that manifest in real deployment contexts. The MACHIAVELLI benchmark uses text-based game environments, and the extent to which performance on these environments predicts behavior in consequential real-world settings is not established.
Field-building outcome measurement: CAIS reports counts of researchers supported and grant dollars distributed, but does not publicly report outcome data for its programs — for example, where ML Safety Scholars alumni work subsequently, what research they produce, or whether compute grant recipients remain in safety research. Without outcome data, the field-building impact claims are difficult to evaluate independently.

Critiques of the 2023 Statement

Beyond the framing critiques noted above, several researchers argued that the statement's format — a single declarative sentence without methodology, evidence, or mechanism — made it unsuitable as a scientific communication and more akin to a public advocacy document. Others noted that some signatories are not primarily working on extinction-risk problems, which complicated interpretation of the statement as a signal of expert consensus on the technical merits of the extinction-risk hypothesis. See the Wikipedia article on the Statement on AI Risk for a summary of these critiques and specific critics.

Current Trajectory & Timeline

Research Roadmap

The following research priorities were described by CAIS as goals for 2024–2026. Actual outcomes against these goals have not been independently verified and are not currently documented on this page.

Priority Area	Stated Goals	Status
Representation Engineering	Scale methods to frontier models; pursue industry adoption for safety evaluation	Outcome unverified
Evaluation Frameworks	Develop comprehensive benchmark suite; establish standard evaluation protocols	Outcome unverified
Alignment Methods	Proof-of-concept demonstrations; practical implementation work	Outcome unverified
Policy Research	Technical governance recommendations; regulatory framework development	Outcome unverified

A previously cited projection of "2x expansion by 2025" appeared in earlier versions of this page without a cited source. Whether this projection materialized has not been verified.

Organizational Scale

Staff: CAIS is organized into four functional teams (Research, Cloud and DevOps, Projects, Operations); total headcount is not publicly disclosed
Affiliates: The compute cluster supports 150+ active researchers across approximately 20 research labs
Budget: $10.2M revenue and $7.2M expenses in 2024 per IRS Form 990

Key Uncertainties & Research Cruxes

Technical Challenges

These represent genuine open questions in CAIS's research agenda, not settled conclusions:

Representation Engineering Scalability: Whether methods developed on mid-scale models transfer reliably to frontier-scale systems remains unclear. The gap between controlled research settings and deployment conditions is a known limitation.
Benchmark Validity: Whether evaluations like MACHIAVELLI capture risks that manifest in real deployment — rather than behavior specific to text-game environments — is unresolved. This is a field-wide challenge, not unique to CAIS.
Alignment Verification: There is no established consensus on how to verify that an AI system is successfully aligned with intended goals rather than passing evaluations through surface-level pattern matching.

Strategic Questions

Research vs. Policy Balance: CAIS allocates resources across technical research, field-building, and policy communication. The optimal allocation is not obvious, and different observers weight these activities differently based on their models of how AI safety progress happens.
Open vs. Closed Research: Publishing safety research openly makes it available to the broader community but may also inform adversarial actors. CAIS has not publicly articulated a detailed position on this tradeoff.
Timeline Assumptions: Appropriate research priorities depend substantially on assumptions about AGI timelines and the nature of AI risk. Researchers with shorter timelines and those focused on long-run speculative risk reach different conclusions about what work is most valuable now.
Near-term vs. Long-term Risk Balance: Whether resources spent on extinction-risk scenarios are appropriately calibrated relative to near-term AI harms is a live debate both within and outside the AI safety community, and CAIS's position at the long-run end of this spectrum is contested.

Leadership & Key Personnel

Key People

Dan Hendrycks

Executive Director

UC Berkeley PhD; previously at Google Brain

Mantas Mazeika

Research Director

University of Chicago; research focus on adversarial machine learning

Thomas Woodside

Policy Director

Former congressional staffer with technology policy background

Andy Zou

Research Scientist

CMU affiliation; research on jailbreaking and red-teaming

Note: Staff roles and affiliations reflect information available at time of last edit and may not reflect current positions. Andy Zou holds joint affiliation with CMU and CAIS; his primary institutional role should be verified against current sources.

Positioning Within the AI Safety Ecosystem

CAIS occupies a specific position within the broader AI safety research landscape that distinguishes it from peer organizations:

vs. MIRI: MIRI focuses almost exclusively on foundational theoretical alignment research and does not run field-building or public communication programs. CAIS's research is more empirical and its scope is broader institutionally.
vs. CHAI: CHAI (Center for Human-Compatible AI, UC Berkeley) is an academic center with a narrower research agenda centered on value alignment. CAIS has a more explicit field-building and policy communication mandate.
vs. Redwood Research: Redwood focuses on specific empirical safety problems with a small team; CAIS has a larger scope including grant programs and public communication.
vs. METR and ARC Evaluations: These organizations focus specifically on model evaluations and dangerous capability assessments. CAIS's evaluation work (MACHIAVELLI) is one component of a broader agenda.
vs. GovAI: GovAI focuses on AI governance and policy research. CAIS does policy communication but its primary identity is as a technical research organization.

The common thread across CAIS-adjacent organizations is EA-aligned funding, primarily from Coefficient Giving, which has led to criticisms that the AI safety field as constituted reflects the priorities of a relatively narrow philanthropic and ideological community rather than a broad scientific consensus.

Sources & Resources

Official Resources

Type	Resource	Description
Website	safe.ai↗	Main organization hub
Research	CAIS Publications↗	Technical papers and reports
Blog	CAIS Blog↗	Research updates and commentary
Courses	ML Safety Course↗	Educational materials on machine learning safety

Key Research Papers

Paper	Year	Description
Unsolved Problems in ML Safety↗	2022	Research agenda taxonomy; citation counts should be verified via Google Scholar or Semantic Scholar
MACHIAVELLI Benchmark↗	2023	Evaluation framework for goal-directed AI behavior in game environments
Representation Engineering↗	2023	Methods for reading and steering AI model internal representations

Technical Safety Research: MIRI, CHAI, Redwood Research
Evaluations: ARC Evaluations, METR
Policy Focus: GovAI, RAND Corporation↗
Industry Labs: Anthropic, OpenAI, Google DeepMind
Funders: Coefficient Giving

References

1OpenAI Official HomepageOpenAI▸

OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.

★★★★☆

openai.com

2Representation EngineeringCenter for AI Safety▸

This resource appears to be a blog post from the Center for AI Safety (CAIS) about Representation Engineering, a technique for understanding and controlling AI model internals. However, the page is currently unavailable (404 error), so the specific content cannot be assessed.

★★★★☆

safe.ai

3Statement on AI Risk - Center for AI SafetyCenter for AI Safety▸

A concise open letter coordinated by the Center for AI Safety stating that mitigating extinction-level risk from AI should be a global priority alongside pandemics and nuclear war. The statement has been signed by hundreds of leading AI researchers, executives, and public figures including Geoffrey Hinton, Yoshua Bengio, Sam Altman, and Demis Hassabis, lending significant institutional credibility to existential AI risk concerns.

★★★★☆

safe.ai

4Center for AI Safety (CAIS) Research PublicationsCenter for AI Safety▸

The Center for AI Safety (CAIS) publishes both technical and conceptual research aimed at mitigating high-consequence, societal-scale risks from AI. Their technical work focuses on safety benchmarks, robustness, machine ethics, and biosecurity, while their conceptual research draws on philosophy, safety engineering, and international relations to understand AI risk.

★★★★☆

safe.ai

5Representation Engineering: A Top-Down Approach to AI TransparencyarXiv·Andy Zou et al.·2023·Paper▸

This paper introduces representation engineering (RepE), a top-down approach to AI transparency that analyzes population-level representations in deep neural networks rather than individual neurons. Drawing from cognitive neuroscience, RepE provides methods for monitoring and manipulating high-level cognitive phenomena in large language models. The authors demonstrate that RepE techniques can effectively address safety-relevant problems including honesty, harmlessness, and power-seeking behavior, offering a promising direction for improving AI system transparency and control.

★★★☆☆

arxiv.org

6Intro to ML Safety Coursecourse.mlsafety.org▸

A structured university-level course on machine learning safety developed by the Center for AI Safety, covering topics from robustness and anomaly detection to alignment and systemic safety. The course includes lecture recordings, slides, notes, and coding assignments across modules on safety engineering, robustness, monitoring, alignment, and emerging risks.

course.mlsafety.org

7MACHIAVELLI datasetarXiv·Alexander Pan et al.·2023·Paper▸

MACHIAVELLI is a benchmark dataset of 134 Choose-Your-Own-Adventure games with over 500,000 scenarios designed to evaluate whether AI agents naturally learn Machiavellian behaviors like power-seeking, deception, and ethical violations when trained to maximize reward. The authors use language models for automated scenario labeling and mathematize dozens of harmful behaviors to evaluate agents' tendencies. Their findings reveal a tension between reward maximization and ethical behavior, but demonstrate that agents can be steered toward less harmful actions through LM-based methods, suggesting that designing agents that are simultaneously safe and capable is achievable.

★★★☆☆

arxiv.org

8[2103.03874] Measuring Mathematical Problem Solving With the MATH DatasetarXiv·Dan Hendrycks et al.·2021·Paper▸

This paper introduces MATH, a benchmark of 12,500 competition mathematics problems with step-by-step solutions, revealing that large Transformer models achieve surprisingly low accuracy and that scaling alone is insufficient for mathematical reasoning. The authors also release an auxiliary pretraining dataset to aid mathematical learning. The work highlights a fundamental gap between current scaling trends and genuine mathematical reasoning ability.

★★★☆☆

arxiv.org

9Center for AI Safety (CAIS) BlogCenter for AI Safety▸

The official blog of the Center for AI Safety (CAIS), a leading AI safety research organization focused on reducing societal-scale risks from advanced AI systems. The blog publishes research updates, policy commentary, and educational content on AI safety topics including existential risk, alignment, and governance.

★★★★☆

safe.ai

10Center for AI Safety (CAIS) – HomepageCenter for AI Safety▸

The Center for AI Safety (CAIS) is a research organization focused on mitigating catastrophic and existential risks from advanced AI systems. It conducts technical research, publishes surveys and statements, and supports field-building efforts across academia and industry. CAIS is notable for its broad coalition-building, including its widely-cited statement on AI extinction risk signed by leading researchers.

★★★★☆

safe.ai

11Anthropic - AI Safety Company HomepageAnthropic▸

Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.

★★★★☆

anthropic.com

12RAND: AI and National SecurityRAND Corporation▸

RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.

★★★★☆

rand.org

13Google Scholar: Stuart RussellGoogle Scholar▸

Google Scholar profile for Stuart Russell, professor at UC Berkeley and one of the most influential figures in AI safety research. Russell is co-author of the leading AI textbook 'Artificial Intelligence: A Modern Approach' and author of 'Human Compatible,' which argues for a fundamental redesign of AI around human preferences and uncertainty. His research spans AI alignment, inverse reward design, and the long-term risks of advanced AI systems.

★★★★☆

scholar.google.com

14Unsolved Problems in ML SafetyarXiv·Dan Hendrycks, Nicholas Carlini, John Schulman & Jacob Steinhardt·2021·Paper▸

This paper presents a comprehensive roadmap for ML safety research, identifying four critical problem areas that the field must address as machine learning systems grow larger and are deployed in high-stakes applications. The authors categorize safety challenges into Robustness (withstanding hazards), Monitoring (identifying hazards), Alignment (reducing inherent model hazards), and Systemic Safety (reducing systemic hazards). By clarifying the motivation behind each problem and providing concrete research directions, the paper aims to guide the ML safety research community toward addressing emerging safety challenges posed by large-scale models.

★★★☆☆

arxiv.org

15Center for AI Safety – Wikipedia.Wikipedia·Reference▸

Wikipedia's overview of the Center for AI Safety (CAIS), a nonprofit organization focused on reducing societal-scale risks from advanced AI systems. CAIS is known for publishing the 2023 statement on AI extinction risk signed by hundreds of leading AI researchers and for conducting technical safety research. The article covers the organization's founding, mission, key initiatives, and notable figures involved.

★★★☆☆

en.wikipedia.org

16Center For Artificial Intelligence Safety Inc - Nonprofit Explorer - ProPublicaprojects.propublica.org▸

This ProPublica page provides IRS Form 990 financial data for the Center for Artificial Intelligence Safety Inc (CAIS), a 501(c)(3) nonprofit based in San Francisco. For fiscal year 2024, CAIS reported $10.2M in revenue (93.8% from contributions) and $7.16M in expenses. Key personnel include Executive Director Dan Hendrycks and Policy Lead Nicholas Beckstead.

projects.propublica.org

17Center for AI Safety Action Fund (CAIS AF)Center for AI Safety▸

The Center for AI Safety Action Fund (CAIS AF) is a nonpartisan advocacy organization working to advance public policies that maintain U.S. leadership in AI and protect against AI-related national security threats. It convenes lawmakers, business leaders, and technical experts to build bipartisan consensus on AI safety policy. Key priorities include chip manufacturing leadership, compute security, preventing malicious AI use, and global cooperation for safe AI.

★★★★☆

action.safe.ai

18About Us | CAISCenter for AI Safety▸

The Center for AI Safety (SAFE) is a nonprofit organization focused on reducing societal-scale risks from advanced AI systems. The about page outlines their mission, team, and core research and advocacy activities aimed at ensuring AI development benefits humanity. They work across technical safety research, policy engagement, and public education.

★★★★☆

safe.ai

Round	Date	Raised	Valuation	Lead Investor
FTX Future Fund Grant 2022grant	2022	$6.5M	—	FTX Future Fund
SFF General Support 2023-H2 (Jaan Tallinn)grant	2023	$909K	—	Jaan Tallinn
SFF General Support 2023-H1 (Jaan Tallinn)grant	2023	$22K	—	Jaan Tallinn
Open Philanthropy General Support 2023grant	Apr 2023	$4M	—	Open Philanthropy
Open Philanthropy General Support 2024grant	2024	$8.5M	—	Open Philanthropy
SFF General Support 2024 (Jaan Tallinn)grant	2024	$1.1M	—	Jaan Tallinn

Property	Value	As Of	Source
Headquarters	San Francisco
Founded Date	2022

Property	Value	As Of
Grant Received	$1.1M	2025
3 earlier values 2024$2.8M 2023$5.5M 2022$5.2M
Total Funding Raised	$33M	2025
Net Assets	$12M	2024
2 earlier values 2023$8.5M 2022$5.8M
Annual Expenses	$7.2M	2024
2 earlier values 2023$8.1M 2022$817K
Revenue	$10M	2024
2 earlier values 2023$16M 2022$6.7M

Property	Value	As Of
Key Person	Dan Hendrycks	2025
Compensation	Dan Hendrycks takes $1 annual salary as Executive Director	2025
Publication	The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning — benchmark for evaluating dual-use AI capabilities in biosecurity, cybersecurity, and chemical weapons	Mar 2024
2 earlier values Oct 2023Representation Engineering: A Top-Down Approach to AI Transparency — proposes methods to read and control LLM internal representations for safety Jan 2021Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects
Infrastructure	Compute cluster with 256 NVIDIA A100 GPUs available for AI safety researchers	2024
Program	ML Safety Scholars — educational program training hundreds of students in AI safety fundamentals. Includes online course, reading groups, and mentorship.	2024
Board Member	Jaan Tallinn	2024
Campaign	Statement on AI Risk (May 2023): one-sentence statement 'Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.' Signed by 350+ AI leaders including Geoffery Hinton, Demis Hassabis, Sam Altman, and Dario Amodei.	May 2023

Title	Date	EventType	Description	Significance
"Unsolved Problems in ML Safety" published	2022	publication	Taxonomy of open technical challenges in machine learning safety, intended partly as a research agenda for the field.	major
Founded by Dan Hendrycks and Oliver Zhang	2022	founding	Nonprofit research organization (EIN 88-1751310) focused on technical AI safety research, field-building, and public communication.	major
MACHIAVELLI benchmark released	2023	publication	Benchmark for evaluating goal-directed and deceptive behavior in AI systems.	moderate
Representation Engineering paper published	2023	publication	Methods for reading and steering model internal representations.	major
Statement on AI Risk released	2023-05	milestone	One-sentence statement on AI extinction risk attracted signatures from over 350 AI researchers and industry figures, including Turing Award recipients (Hinton, Bengio, Russell) and CEOs of major AI labs (Altman, Amodei, Hassabis).	major
Reported revenue of $10.2M (FY2024)	2024	milestone	Cumulative funding reaches ~$33M since founding ($6.7M in 2022, $16.1M in 2023, $10.2M in 2024).	moderate

Name	DivisionType	Status	Lead	Slug	StartDate
Field-Building	program-area	active	—	—	—
Compute Cluster	program-area	active	—	—	—
Research	team	active	—	—	—
Research	team	active	Dan Hendrycks	—	—
AI and Society Fellowship	program-area	active	—	cais-fellowship	—
CAIS Compute Cluster	lab	active	—	cais-compute	—
CAIS Action Fund	program-area	active	Varun Krovi	Center for AI Safety Action Fund	2023-07

Title	PublicationType	Authors	Url	PublishedDate	IsFlagship
Humanity's Last Exam	paper	Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li et al.	arxiv.org	2025-01	✓
Introduction to AI Safety, Ethics, and Society	book	Dan Hendrycks	aisafetybook.com	2024-06	✓
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	paper	Nathaniel Li, Alexander Pan, Anjali Gopal et al.	wmdp.ai	2024	✓
Superintelligence Strategy	report	Dan Hendrycks, Eric Schmidt, Alexandr Wang	nationalsecurity.ai	2024	✓
Improving Alignment and Robustness with Circuit Breakers	paper	Andy Zou, Long Phan, Justin Wang et al.	arxiv.org	2024	—
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming	paper	Mantas Mazeika, Long Phan, Xuwang Yin et al.	harmbench.org	2024	✓
Representation Engineering: A Top-Down Approach to AI Transparency	paper	Andy Zou, Long Phan, Sarah Chen et al.	arxiv.org	2023-10	✓
An Overview of Catastrophic AI Risks	paper	Dan Hendrycks, Mantas Mazeika, Thomas Woodside	arxiv.org	2023-06	—
Statement on AI Risk	policy-brief	CAIS	aistatement.com	2023-05	✓
Universal and Transferable Adversarial Attacks on Aligned Language Models	paper	Andy Zou, Zifan Wang, Nicholas Carlini et al.	llm-attacks.org	2023	✓
Unsolved Problems in ML Safety	paper	Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt	arxiv.org	2021-09	✓
Measuring Massive Multitask Language Understanding (MMLU)	paper	Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt	arxiv.org	2020-09	✓

Center for AI Safety (CAIS)