Longterm Wiki

Process Supervision

Alignment Trainingactive

Step-level reward signals for reasoning verification, as opposed to outcome-only rewards.

Key Papers
1
First Proposed: 2023 (Lightman et al., OpenAI)
Cluster: Alignment Training
Parent Area: Reward Modeling

Tags

function:specificationstage:trainingscope:technique

Key Papers & Resources1

SEMINAL
Let's Verify Step by Step
Lightman et al. (OpenAI)2023