Interpretability

TeamActive

Anthropic·2021-01 – present·Wiki page →

The Interpretability team's mission is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes. Led by Chris Olah.

Other Anthropic Divisions

AI Welfare ResearchTeam

AlignmentTeam

Economic ResearchTeam

Frontier Red TeamTeam

Policy and SafeguardsTeam

Societal ImpactsTeam

Trust and SafetyTeam

← Back to Anthropic