Longterm Wiki

Constitutional AI

Anthropic's Constitutional AI methodology uses explicit principles and AI-generated feedback to train safer language models, demonstrating 3-10x improvements in harmlessness while maintaining helpfulness across major model deployments.

Related

Related Pages

Top Related Pages

Safety Research

Anthropic Core Views

Risks

Scheming

Analysis

AI Safety Intervention Effectiveness MatrixAnthropic Impact Assessment Model

Concepts

Dense TransformersExistential Risk from AISituational AwarenessAgentic AI

Other

Dario AmodeiAI ControlClaudeEliezer YudkowskyAnthropic Stakeholders

Organizations

Google DeepMind

Key Debates

AI Alignment Research AgendasAI Accident Risk CruxesWhy Alignment Might Be HardWhy Alignment Might Be Easy

Policy

AI Model Specifications

Historical

Mainstream Era

Tags

constitutional-airlaifharmlessnesstraining-methodologyanthropic