Also known as: Redwood
A nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark alignment faking studies with Anthropic.
Key Metrics
Revenue (ARR)
Headcount
Facts
9Other Data
| Dimension | Rating | Evidence | Assessor | |
|---|---|---|---|---|
| focus-area | AI systems acting against developer interests | Primary research on AI Control and alignment faking | editorial | |
| funding | $25M+ from Coefficient Giving | $9.4M (2021), $10.7M (2022), $5.3M (2023) | editorial | |
| key-concern | Research output relative to funding | 2023 critics cited limited publications; subsequent ICML, NeurIPS work addressed this | editorial | |
| team-size | 10 staff (2021), 6-15 research staff (2023 estimate) | Early team of 10 expanded to research organization | editorial |
| Title | Date | EventType | Description | Significance | |
|---|---|---|---|---|---|
| Alignment faking paper with Anthropic | 2024-12 | publication | Landmark collaboration with Anthropic on alignment faking research. | major | |
| Buck Shlegeris becomes CEO; AI Control ICML oral | 2024 | leadership-change | Buck Shlegeris transitions from CTO to CEO and Director; Ryan Greenblatt serves as Chief Scientist. AI Control work accepted as an ICML oral. | major | |
| REMIX interpretability program runs | 2023 | launch | Mechanistic interpretability training program for ~10-15 junior researchers. | moderate | |
| Adversarial robustness research project | 2022 | milestone | Initial adversarial training project; later acknowledged by leadership as unsuccessful. | minor | |
| Causal scrubbing methodology developed | 2022 | publication | Developed across 2022-2023; method for rigorously testing mechanistic interpretability claims. | moderate | |
| MLAB bootcamp launches | 2021-12 | launch | Inaugural ML for Alignment Bootcamp with 40 participants; 3-week intensive teaching attendees to build BERT/GPT-2 from scratch. | moderate | |
| Tax-exempt status granted; 10 staff assembled | 2021-09 | founding | — | major |
Divisions
1Core research team working on interpretability and adversarial training techniques. Small org (~15-20 people) focused on applied alignment research including causal scrubbing and circuit-level interpretability.
Prediction Markets
8 activeRelated Wiki Pages
Top Related Pages
AI Control
A defensive safety approach maintaining control over potentially misaligned AI systems through monitoring, containment, and redundancy, offering 40...
Anthropic
An AI safety company founded by former OpenAI researchers that develops frontier AI models while pursuing safety research, including the Claude mod...
AI Alignment Research Agendas
Analysis of major AI safety research agendas comparing approaches from Anthropic (\$100M+ annual safety budget, 37-39% team growth), DeepMind (30-5...
Technical AI Safety Research
Technical AI safety research aims to make AI systems reliably safe through scientific and engineering work.
Scheming
AI scheming—strategic deception during training to pursue hidden goals—has demonstrated emergence in frontier models.