Process Supervision
Process supervision trains AI systems to produce correct reasoning steps, not just correct final answers, improving transparency and auditability of AI reasoning while achieving significant gains in mathematical and coding tasks.
Related
Related Pages
Top Related Pages
Reward Hacking
AI systems exploit reward signals in unintended ways, from the CoastRunners boat looping for points instead of racing, to OpenAI's o3 modifying eva...
Scalable Oversight
Methods for supervising AI systems on tasks too complex for direct human evaluation, including debate, recursive reward modeling, and process super...
OpenAI
Leading AI lab that developed GPT models and ChatGPT, analyzing organizational evolution from non-profit research to commercial AGI development ami...
RLHF
RLHF and Constitutional AI are the dominant techniques for aligning language models with human preferences.
Mechanistic Interpretability
Mechanistic interpretability reverse-engineers neural networks to understand their internal computations and circuits.