Also known as: Redwood
Founded by Nate Thomas, and Buck Shlegeris
A nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark alignment faking studies with Anthropic.
Key Metrics
Revenue (ARR)
Headcount
Facts
10Divisions
1Core research team working on interpretability and adversarial training techniques. Small org (~15-20 people) focused on applied alignment research including causal scrubbing and circuit-level interpretability.
Prediction Markets
8 activeRelated Wiki Pages
Top Related Pages
AI Control
A defensive safety approach maintaining control over potentially misaligned AI systems through monitoring, containment, and redundancy, offering 40...
Anthropic
An AI safety company founded by former OpenAI researchers that develops frontier AI models while pursuing safety research, including the Claude mod...
AI Alignment Research Agendas
Analysis of major AI safety research agendas comparing approaches from Anthropic (\$100M+ annual safety budget, 37-39% team growth), DeepMind (30-5...
Technical AI Safety Research
Technical AI safety research aims to make AI systems reliably safe through scientific and engineering work.
Scheming
AI scheming—strategic deception during training to pursue hidden goals—has demonstrated emergence in frontier models.