Longterm Wiki
Back

online iterative RLHF

web
rlhfbook.com·rlhfbook.com/

Data Status

Not fetched

Cited by 2 pages

PageTypeQuality
Reward ModelingApproach55.0
RLHFCapability63.0

Cached Content Preview

HTTP 200Fetched Feb 26, 20262 KB
## Changelog

_Last built: 25 February 2026_

**January 2026**: Major chapter reorganization to match Manning book structure. Old URLs redirect to new locations.

**December 2025**: Working on v2 of the book based on editors feedback! Do check back for updates!

**2 July 2025**: Add tool use chapter (see [PR](https://github.com/natolambert/rlhf-book/pull/122))

**6 June 2025**: v1.1. Lots of RLVR/reasoning improvements (see [PR](https://github.com/natolambert/rlhf-book/pull/120))

**14 Apr. - 16 Apr. 2025**: Finish v0. Overoptimization, open questions, etc.

**6 Apr. - 12 Apr. 2025.**: Evaluation section

**28 Mar. - 5 Apr. 2025.**: Research on RLHF x Product, cleaning, improving website, reasoning section

**17 Mar. - 27 Mar 2025.**: Improving policy gradient section, minor changes

**6 Mar. - 16 Mar 2025.**: Finish DPO, major cleaning

**26 Feb. - 5 Mar 2025.**: Start DPO chapter, improve intro

**20-25 Feb. 2025**: Improve SEO, add IFT chapter, minor edits

**10-15 Feb. 2025**: RM additions, preference data, cleaning, policy gradient finalization

**8 Feb. 2025**: RM additions, editing, cleaning

**4 Feb. 2025**: PPO and GAE

**2 Feb. 2025**: Added changelog, revamped introduction,

## Acknowledgements

I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, Valentina Pyatkin, Daniel Han, Shane Gu, Joanne Jang, LJ Miranda, and others in my RL sphere.

Additionally, thank you to the [contributors on GitHub](https://github.com/natolambert/rlhf-book/graphs/contributors) who helped improve this project.
Resource ID: ebcbaba2d260e656 | Stable ID: MmI0NDE1OW