Longterm Wiki

Constitutional AI

Alignment Trainingactive

Training methodology using explicit principles and AI-generated feedback (RLAIF) to train safer language models.

Organizations
1
Key Papers
1
First Proposed: 2022 (Bai et al., Anthropic)
Cluster: Alignment Training
Parent Area: RLHF

Tags

function:specificationstage:trainingscope:technique

Organizations1

OrganizationRole
Anthropicpioneer

Key Papers & Resources1

SEMINAL