Direct Preference Optimization
Alignment TrainingactiveFamily of reward-free alignment methods (DPO, KTO, IPO, ORPO, GRPO) that bypass explicit reward model training.
Organizations
3
Key Papers
1
Tags
function:specificationstage:trainingscope:technique
Organizations3
| Organization | Role |
|---|---|
| Anthropic | active |
| Google DeepMind | active |
| OpenAI | active |
Key Papers & Resources1
SEMINAL