Direct Preference Optimization

Alignment Trainingactive

Family of training methods (DPO, KTO, GRPO) that optimize language models directly on preference data without a separate reward model.

Organizations

3

Key Papers

1

First Proposed: 2023 (Rafailov et al.)

Cluster: Alignment Training

Organizations3

Organization	Role
Anthropic	active
Google DeepMind	active
OpenAI	active

Key Papers & Resources1

SEMINAL

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafailov et al.2023

Tags

trainingpreferencesalignment