Longterm Wiki

Reward Modeling

Alignment Trainingactive

Training neural networks on human preference comparisons to provide scalable reward signals for RL fine-tuning.

Key Papers
1
First Proposed: 2017 (Christiano et al.)
Cluster: Alignment Training
Parent Area: RLHF

Tags

function:specificationstage:trainingscope:technique

Key Papers & Resources1

Sub-Areas1

NameStatusOrgsPapers
Process SupervisionStep-level reward signals for reasoning verification, as opposed to outcome-only rewards.active01