Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusContent
Edited today2.9k words35 backlinksUpdated every 3 weeksDue in 3 weeks
42QualityAdequate •88.5ImportanceHigh17.5ResearchMinimal
Summary

CAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering and MACHIAVELLI benchmark papers, and organized the May 2023 Statement on AI Risk signed by over 350 AI researchers and industry leaders. The organization focuses on technical safety research, field-building, and policy communication.

Content7/13
LLM summaryScheduleEntityEdit history1Overview
Tables6/ ~12Diagrams0/ ~1Int. links58/ ~23Ext. links0/ ~15Footnotes0/ ~9References14/ ~9Quotes0Accuracy0RatingsN:2.5 R:4 A:3.5 C:5.5Backlinks35
Change History1
Surface tacticalValue in /wiki table and score 53 pages3 weeks ago

Added `tacticalValue` to `ExploreItem` interface, `getExploreItems()` mappings, the `/wiki` explore table (new sortable "Tact." column), and the card view sort dropdown. Scored 49 new pages with tactical values (4 were already scored), bringing total to 53.

sonnet-4 · ~30min

Issues1
QualityRated 42 but structure suggests 73 (underrated by 31 points)

CAIS (Center for AI Safety)

Academic

Center for AI Safety

CAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering and MACHIAVELLI benchmark papers, and organized the May 2023 Statement on AI Risk signed by over 350 AI researchers and industry leaders. The organization focuses on technical safety research, field-building, and policy communication.

TypeAcademic
Founded2022
LocationSan Francisco, CA
Websitesafe.ai
Related
Risks
Existential Risk from AIPower-Seeking AI
Organizations
Anthropic
2.9k words · 35 backlinks
Academic

Center for AI Safety

CAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering and MACHIAVELLI benchmark papers, and organized the May 2023 Statement on AI Risk signed by over 350 AI researchers and industry leaders. The organization focuses on technical safety research, field-building, and policy communication.

TypeAcademic
Founded2022
LocationSan Francisco, CA
Websitesafe.ai
Related
Risks
Existential Risk from AIPower-Seeking AI
Organizations
Anthropic
2.9k words · 35 backlinks

Overview

The Center for AI Safety (CAIS) is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication. Founded by Dan Hendrycks, CAIS received substantial public attention in May 2023 when it organized a one-sentence statement on AI extinction risk that attracted signatures from over 350 AI researchers and industry figures, including several Turing Award recipients and heads of major AI laboratories.

CAIS operates across three areas: technical research on AI alignment and robustness, grant and fellowship programs intended to grow the AI safety research community, and communication efforts aimed at policymakers and the public. Its technical output includes work on Representation Engineering and the MACHIAVELLI benchmark for evaluating goal-directed behavior in AI systems. The organization is primarily funded by Coefficient Giving, an EA-aligned philanthropic fund, a funding relationship that is relevant context for assessing its research priorities and institutional positioning.

CAIS occupies a distinct niche in the AI safety ecosystem: unlike academic centers such as CHAI or research-focused organizations like MIRI, it combines original technical research with explicit field-building and public communication goals. Critics have questioned whether its emphasis on long-run extinction risk is appropriately calibrated relative to near-term AI harms, and whether EA-concentrated funding in this space creates ideological homogeneity in safety research priorities. These debates are discussed in the Critiques and Limitations section below.

Organizational Background

CAIS was established as a nonprofit research organization with the goal of filling a perceived gap between technical AI safety research and broader scientific and public awareness of AI risks. Dan Hendrycks, who completed his PhD at UC Berkeley, founded CAIS to provide infrastructure — compute grants, fellowships, educational resources, and policy engagement — that individual academic researchers lacked access to.

The organization's theory of change rests on several linked assumptions: that AI systems pose meaningful risks of societal-scale harm, including possible catastrophic outcomes; that the current period is important for establishing safety-relevant research norms and technical methods; and that field-building activities (funding researchers, running educational programs, facilitating policy engagement) will increase the probability of good outcomes by growing and coordinating the safety research community. Whether these assumptions are well-founded is contested, and the organization's critics have argued that the extinction-risk framing in particular overstates speculative long-run risks relative to observable near-term harms.

CAIS is legally structured as a nonprofit. Its primary disclosed funder is Coefficient Giving, which has made grants to CAIS as part of its AI safety grantmaking portfolio. Exact annual budget figures are not publicly confirmed by CAIS; estimates of approximately $5M annually have circulated but have not been verified against IRS Form 990 filings, which would be the authoritative source for nonprofit financials.

Funding

CAIS's primary disclosed funder is Coefficient Giving, a philanthropic organization closely associated with the effective altruism movement. This funding relationship is material context for interpreting the organization's research agenda: Coefficient Giving has historically prioritized long-run catastrophic and extinction-level AI risk over near-term AI harms, and CAIS's framing broadly reflects this prioritization.

No comprehensive public breakdown of CAIS's funding sources or annual budget has been identified. The figure of approximately $5M annually cited in earlier versions of this page is an unverified estimate and should not be treated as authoritative. Readers seeking verified financial data should consult CAIS's IRS Form 990 filings, which are publicly available through ProPublica Nonprofit Explorer or the IRS TEOS database.

The concentration of AI safety funding through EA-aligned funders including Coefficient Giving (formerly Open Philanthropy) has been noted by critics as a potential source of ideological constraint on safety research priorities — organizations dependent on this funding may face implicit pressure to prioritize framings and research directions consistent with funder worldviews. CAIS has not publicly addressed this critique directly.

Key Research Areas

Technical Safety Research

Research DomainKey ContributionsNotes
Representation EngineeringMethods for reading and steering model internal representationsPublished 2023; independent replication and scalability to frontier models remains an open research question
Safety BenchmarksMACHIAVELLI benchmark for evaluating goal-directed and deceptive behaviorCited in subsequent research; the extent to which it has been formally integrated into evaluation pipelines at Anthropic or OpenAI is not publicly documented
Adversarial RobustnessEvaluation protocols and defense mechanismsPart of the broader Adversarial Robustness research agenda
Alignment FoundationsConceptual frameworks and problem taxonomies for AI safetyIncluding the "Unsolved Problems in ML Safety" paper (2022)

Major Publications & Tools

  • Representation Engineering: A Top-Down Approach to AI Transparency (2023) — Methods for understanding and influencing AI decision-making by working with internal representations rather than input-output behavior alone
  • Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior (2023) — Introduces the MACHIAVELLI benchmark for evaluating whether AI agents pursue goals through unethical means in text-based game environments
  • Unsolved Problems in ML Safety (2022) — A taxonomy of open technical challenges in machine learning safety, intended partly as a research agenda for the field
  • Measuring Mathematical Problem Solving With the MATH Dataset (2021) — A benchmark for evaluating AI mathematical reasoning, authored by Dan Hendrycks and collaborators during his PhD at UC Berkeley; this paper predates CAIS's founding and is a product of Hendrycks's academic research rather than an organizational output of CAIS

Citation counts for these papers (figures such as "200+", "50+", "30+") previously appeared on this page without sourced methodology. Readers seeking current citation data should consult Google Scholar or Semantic Scholar directly.

Field-Building Programs

CAIS runs several programs intended to grow the population of researchers working on AI safety. The term "field-building" refers to activities designed to increase the size, diversity, and coordination of a research community — in this case, researchers focused on technical and governance aspects of AI safety.

Grant Programs

ProgramReported ScaleDescriptionTimeline
Compute Grants$2M+ distributed; number of recipients reported variously as 100+ and 200+ in different CAIS materials — figure unverifiedProvides compute resources to researchers working on safety-relevant projects2022–present
ML Safety ScholarsApproximately 50 participants per cohort Structured program for early-career researchers entering the AI safety field2021–present
Research Fellowships$500K+ annually Fellowships placing researchers at academic and research institutions2022–present
AI Safety Camp200+ participants total Collaborative program supporting international research teams2020–present

Note: Quantitative figures in this table are drawn from CAIS's own communications and have not been independently verified. The compute grant recipient count is internally inconsistent across CAIS materials (100+ in some sources, 200+ in others); the higher figure may aggregate across all field-building programs rather than compute grants alone.

Institutional Partnerships

  • Academic Collaborations: Reported collaborations with UC Berkeley, MIT, Stanford, and Oxford
  • Industry Engagement: Research interactions with Anthropic and Google DeepMind
  • Policy Connections: Briefings reported with US Congress, UK Parliament, and EU regulatory bodies

Statement on AI Risk (2023)

In May 2023, CAIS published and circulated the Statement on AI Risk, a single sentence co-signed by over 350 AI researchers and industry figures:

"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

The statement was covered widely in major news outlets and was cited in subsequent policy discussions, including in the context of UK and US government AI strategies. The official signatory list is available at safe.ai; the figure of 350+ is drawn from that list, though the precise count at any given time may vary as signatories are added.

Signatory Groups

CategoryNotable SignatoriesDescription
Turing Award RecipientsGeoffrey Hinton, Yoshua Bengio, Stuart RussellRecipients of computing's highest recognition who signed the statement
Industry ExecutivesSam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (DeepMind)CEOs of major AI laboratories
Policy and Governance ResearchersHelen Toner, Allan Dafoe, Gillian HadfieldResearchers working on AI governance and policy
ML/AI Researchers300+ researchers across academia and industryResearchers who signed as individuals, not representing institutional positions

The statement's reception was not uniformly positive within the AI research community. A number of prominent ML researchers declined to sign or publicly criticized the statement's framing. Critics raised several concerns: that the one-sentence format was too vague to convey meaningful technical content; that equating AI risk with nuclear war risk was unsupported by available evidence; that the extinction framing could distract attention and resources from observable near-term harms from AI systems (such as bias, surveillance, and labor displacement); and that the statement's signatories were not uniformly working on extinction-risk problems, making it a weak signal of scientific consensus. These critiques were raised by researchers including those associated with AI fairness and near-term safety research communities.

Proponents argued that the statement served a legitimate coordination function: making it socially acceptable for researchers to discuss catastrophic risk publicly, signaling to policymakers that risk concerns were not fringe views, and creating a reference point for subsequent regulatory discussions. Whether the statement's net effect on AI policy and research prioritization was positive is a matter of ongoing debate.

The statement's impact on specific policy documents — including mentions in UK AI Safety Institute and US AI Safety Institute contexts — has been cited by CAIS, though the causal relationship between the statement and any particular policy outcome is difficult to establish.

Critiques and Limitations

Criticism of Extinction-Risk Framing

The most substantive criticism of CAIS's work concerns its central framing of AI extinction risk as a near-term policy priority. Critics from several directions have argued:

  • Near-term displacement effect: Emphasizing speculative long-run extinction risk may draw funding, talent, and policy attention away from near-term AI harms — discrimination in algorithmic decision-making, AI-enabled surveillance, labor market disruption, and misinformation — that are currently affecting people. Researchers associated with the AI ethics and fairness communities have made this argument most consistently.
  • Epistemic status of extinction claims: The probability of AI-caused human extinction within policy-relevant timeframes is highly uncertain, and critics have argued that treating it as a "global priority alongside pandemics and nuclear war" involves large unjustified inferential steps. Some ML researchers have noted that the mechanisms by which current or near-term AI systems could pose extinction-level risks are not specified with sufficient precision to evaluate.
  • Ideological concentration: CAIS's alignment with EA-associated funders and the broader longtermist intellectual tradition has led critics to argue that its research agenda reflects a particular philosophical worldview rather than a neutral assessment of AI risk. This critique is not unique to CAIS — it applies to several EA-funded AI safety organizations — but it is relevant to assessing how to interpret CAIS's outputs.

Limitations of Specific Research

  • Representation Engineering scalability: The representation engineering paper introduced methods that work on models of a given scale; whether these methods generalize to frontier-scale models is an open question, and independent researchers have noted limitations in the approach's applicability to very large models.
  • Benchmark validity: A general concern in AI safety evaluation is whether constructed benchmarks (including MACHIAVELLI) capture risks that manifest in real deployment contexts. The MACHIAVELLI benchmark uses text-based game environments, and the extent to which performance on these environments predicts behavior in consequential real-world settings is not established.
  • Field-building outcome measurement: CAIS reports counts of researchers supported and grant dollars distributed, but does not publicly report outcome data for its programs — for example, where ML Safety Scholars alumni work subsequently, what research they produce, or whether compute grant recipients remain in safety research. Without outcome data, the field-building impact claims are difficult to evaluate independently.

Critiques of the 2023 Statement

Beyond the framing critiques noted above, several researchers argued that the statement's format — a single declarative sentence without methodology, evidence, or mechanism — made it unsuitable as a scientific communication and more akin to a public advocacy document. Others noted that some signatories are not primarily working on extinction-risk problems, which complicated interpretation of the statement as a signal of expert consensus on the technical merits of the extinction-risk hypothesis.

Current Trajectory & Timeline

Research Roadmap

The following research priorities were described by CAIS as goals for 2024–2026. Given that this page was last edited in late 2025, some of these projections are now in the past. Actual outcomes against these goals have not been independently verified and are not currently documented on this page.

Priority AreaStated GoalsStatus
Representation EngineeringScale methods to frontier models; pursue industry adoption for safety evaluationOutcome unverified
Evaluation FrameworksDevelop comprehensive benchmark suite; establish standard evaluation protocolsOutcome unverified
Alignment MethodsProof-of-concept demonstrations; practical implementation workOutcome unverified
Policy ResearchTechnical governance recommendations; regulatory framework developmentOutcome unverified

A previously cited projection of "2x expansion by 2025" appeared in earlier versions of this page without a cited source. Whether this projection materialized has not been verified.

Organizational Scale

  • Staff: 15+ full-time staff reported; current headcount has not been independently verified
  • Affiliates: 50+ affiliate researchers reported
  • Budget: Approximately $5M annually — this figure is an unverified estimate; IRS Form 990 filings are the authoritative source for nonprofit financials

Key Uncertainties & Research Cruxes

Technical Challenges

These represent genuine open questions in CAIS's research agenda, not settled conclusions:

  • Representation Engineering Scalability: Whether methods developed on mid-scale models transfer reliably to frontier-scale systems remains unclear. The gap between controlled research settings and deployment conditions is a known limitation.
  • Benchmark Validity: Whether evaluations like MACHIAVELLI capture risks that manifest in real deployment — rather than behavior specific to text-game environments — is unresolved. This is a field-wide challenge, not unique to CAIS.
  • Alignment Verification: There is no established consensus on how to verify that an AI system is successfully aligned with intended goals rather than passing evaluations through surface-level pattern matching.

Strategic Questions

  • Research vs. Policy Balance: CAIS allocates resources across technical research, field-building, and policy communication. The optimal allocation is not obvious, and different observers weight these activities differently based on their models of how AI safety progress happens.
  • Open vs. Closed Research: Publishing safety research openly makes it available to the broader community but may also inform adversarial actors. CAIS has not publicly articulated a detailed position on this tradeoff.
  • Timeline Assumptions: Appropriate research priorities depend substantially on assumptions about AGI timelines and the nature of AI risk. Researchers with shorter timelines and those focused on long-run speculative risk reach different conclusions about what work is most valuable now.
  • Near-term vs. Long-term Risk Balance: Whether resources spent on extinction-risk scenarios are appropriately calibrated relative to near-term AI harms is a live debate both within and outside the AI safety community, and CAIS's position at the long-run end of this spectrum is contested.

Leadership & Key Personnel

Key People

Dan Hendrycks
Executive Director
UC Berkeley PhD; previously at Google Brain
Mantas Mazeika
Research Director
University of Chicago; research focus on adversarial machine learning
Thomas Woodside
Policy Director
Former congressional staffer with technology policy background
Andy Zou
Research Scientist
CMU affiliation; research on jailbreaking and red-teaming

Note: Staff roles and affiliations reflect information available at time of last edit and may not reflect current positions. Andy Zou holds joint affiliation with CMU and CAIS; his primary institutional role should be verified against current sources.

Positioning Within the AI Safety Ecosystem

CAIS occupies a specific position within the broader AI safety research landscape that distinguishes it from peer organizations:

  • vs. MIRI: MIRI focuses almost exclusively on foundational theoretical alignment research and does not run field-building or public communication programs. CAIS's research is more empirical and its scope is broader institutionally.
  • vs. CHAI: CHAI (Center for Human-Compatible AI, UC Berkeley) is an academic center with a narrower research agenda centered on value alignment. CAIS has a more explicit field-building and policy communication mandate.
  • vs. Redwood Research: Redwood focuses on specific empirical safety problems with a small team; CAIS has a larger scope including grant programs and public communication.
  • vs. METR and ARC Evaluations: These organizations focus specifically on model evaluations and dangerous capability assessments. CAIS's evaluation work (MACHIAVELLI) is one component of a broader agenda.
  • vs. GovAI: GovAI focuses on AI governance and policy research. CAIS does policy communication but its primary identity is as a technical research organization.

The common thread across CAIS-adjacent organizations is EA-aligned funding, primarily from Coefficient Giving, which has led to criticisms that the AI safety field as constituted reflects the priorities of a relatively narrow philanthropic and ideological community rather than a broad scientific consensus.

Sources & Resources

Official Resources

TypeResourceDescription
Websitesafe.aiMain organization hub
ResearchCAIS PublicationsTechnical papers and reports
BlogCAIS BlogResearch updates and commentary
CoursesML Safety CourseEducational materials on machine learning safety

Key Research Papers

PaperYearDescription
Unsolved Problems in ML Safety2022Research agenda taxonomy; citation counts should be verified via Google Scholar or Semantic Scholar
MACHIAVELLI Benchmark2023Evaluation framework for goal-directed AI behavior in game environments
Representation Engineering2023Methods for reading and steering AI model internal representations
  • Technical Safety Research: MIRI, CHAI, Redwood Research
  • Evaluations: ARC Evaluations, METR
  • Policy Focus: GovAI, RAND Corporation
  • Industry Labs: Anthropic, OpenAI, Google DeepMind
  • Funders: Coefficient Giving

References

1CAIS SurveysCenter for AI Safety

The Center for AI Safety conducts technical and conceptual research to mitigate potential catastrophic risks from advanced AI systems. They take a comprehensive approach spanning technical research, philosophy, and societal implications.

★★★★☆
215+ citationsGoogle Scholar
★★★★☆
★★★☆☆
4MACHIAVELLI datasetarXiv·Alexander Pan et al.·2023·Paper
★★★☆☆
5Unsolved Problems in ML SafetyarXiv·Dan Hendrycks, Nicholas Carlini, John Schulman & Jacob Steinhardt·2021·Paper
★★★☆☆
6MATHarXiv·Dan Hendrycks et al.·2021·Paper
★★★☆☆
7AI Risk StatementCenter for AI Safety
★★★★☆
8CAIS PublicationsCenter for AI Safety
★★★★☆
9CAIS BlogCenter for AI Safety
★★★★☆
★★★★☆
12OpenAIOpenAI
★★★★☆
13representation engineeringCenter for AI Safety
★★★★☆
14AnthropicAnthropic
★★★★☆

Related Pages

Top Related Pages

Safety Research

Anthropic Core Views

Approaches

Capability Unlearning / RemovalAI Alignment

Analysis

AI Compute Scaling MetricsAI Safety Intervention Effectiveness Matrix

Policy

MAIM (Mutually Assured AI Malfunction)US State AI Legislation Landscape

Other

Dan HendrycksGeoffrey HintonStuart Russell

Organizations

Center for Human-Compatible AIGoogle DeepMindUS AI Safety InstituteRedwood Research

Concepts

AGI TimelineAgentic AISelf-Improvement and Recursive Enhancement

Risks

AI-Induced Irreversibility

Key Debates

AI Safety Field Building and Community

Historical

The MIRI Era