Skip to content
Longterm Wiki

Redwood Research

Safety Organization
Founded Jun 2021 (4 years old)HQ: San Francisco, CAredwoodresearch.org

Also known as: Redwood

Entity
Wiki
About
Business
Policy & Governance
Output & Research
Data

A nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark alignment faking studies with Anthropic.

Revenue
$22K
as of 2024
Headcount
34
as of 2023
Annual Expenses
$2.9M
as of 2024
Net Assets
$6.5M
as of 2024

Key Metrics

Revenue (ARR)

$22K2024
Revenue (ARR) chart. Annual run rate: $14M in 2021 to $22K in 2024.$0$4M$8M$12M$16M2021202220232024

Headcount

342023
Headcount chart. Employees: 10 in 2021 to 34 in 2023.01020304020212023

Facts

9
Financial
Revenue$22K
Net Assets$6.5M
Annual Expenses$2.9M
Headcount34
Other
Legal Identifier87-1702255 (EIN)
Organization
Legal StructureNonprofit research lab
HeadquartersSan Francisco, CA
Founded DateJun 2021
People

Other Data

Entity Assessments
4 entries
DimensionRatingEvidenceAssessor
focus-areaAI systems acting against developer interestsPrimary research on AI Control and alignment fakingeditorial
funding$25M+ from Coefficient Giving$9.4M (2021), $10.7M (2022), $5.3M (2023)editorial
key-concernResearch output relative to funding2023 critics cited limited publications; subsequent ICML, NeurIPS work addressed thiseditorial
team-size10 staff (2021), 6-15 research staff (2023 estimate)Early team of 10 expanded to research organizationeditorial
Entity Events
7 entries
TitleDateEventTypeDescriptionSignificance
Alignment faking paper with Anthropic2024-12publicationLandmark collaboration with Anthropic on alignment faking research.major
Buck Shlegeris becomes CEO; AI Control ICML oral2024leadership-changeBuck Shlegeris transitions from CTO to CEO and Director; Ryan Greenblatt serves as Chief Scientist. AI Control work accepted as an ICML oral.major
REMIX interpretability program runs2023launchMechanistic interpretability training program for ~10-15 junior researchers.moderate
Adversarial robustness research project2022milestoneInitial adversarial training project; later acknowledged by leadership as unsuccessful.minor
Causal scrubbing methodology developed2022publicationDeveloped across 2022-2023; method for rigorously testing mechanistic interpretability claims.moderate
MLAB bootcamp launches2021-12launchInaugural ML for Alignment Bootcamp with 40 participants; 3-week intensive teaching attendees to build BERT/GPT-2 from scratch.moderate
Tax-exempt status granted; 10 staff assembled2021-09foundingmajor

Divisions

1
Team

Core research team working on interpretability and adversarial training techniques. Small org (~15-20 people) focused on applied alignment research including causal scrubbing and circuit-level interpretability.

Prediction Markets

8 active

Related Wiki Pages

Top Related Pages

Safety Research

Anthropic Core Views

Approaches

AI AlignmentCapability Elicitation

Analysis

AI Safety Multi-Actor Strategic LandscapeAI Safety Intervention Effectiveness Matrix

Policy

Safe and Secure Innovation for Frontier Artificial Intelligence Models ActNew York RAISE Act

Organizations

Alignment Research Center (ARC)ConjectureMachine Intelligence Research Institute (MIRI)

Risks

Deceptive AlignmentAI Capability Sandbagging

Other

Ajeya CotraHolden KarnofskyInterpretability

Concepts

Ea Epistemic Failures In The Ftx EraLarge Language ModelsLarge Language ModelsSituational Awareness

Key Debates

Why Alignment Might Be Easy