Back
online iterative RLHF
webrlhfbook.com·rlhfbook.com/
Data Status
Not fetched
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Reward Modeling | Approach | 55.0 |
| RLHF | Capability | 63.0 |
Cached Content Preview
HTTP 200Fetched Feb 26, 20262 KB
## Changelog _Last built: 25 February 2026_ **January 2026**: Major chapter reorganization to match Manning book structure. Old URLs redirect to new locations. **December 2025**: Working on v2 of the book based on editors feedback! Do check back for updates! **2 July 2025**: Add tool use chapter (see [PR](https://github.com/natolambert/rlhf-book/pull/122)) **6 June 2025**: v1.1. Lots of RLVR/reasoning improvements (see [PR](https://github.com/natolambert/rlhf-book/pull/120)) **14 Apr. - 16 Apr. 2025**: Finish v0. Overoptimization, open questions, etc. **6 Apr. - 12 Apr. 2025.**: Evaluation section **28 Mar. - 5 Apr. 2025.**: Research on RLHF x Product, cleaning, improving website, reasoning section **17 Mar. - 27 Mar 2025.**: Improving policy gradient section, minor changes **6 Mar. - 16 Mar 2025.**: Finish DPO, major cleaning **26 Feb. - 5 Mar 2025.**: Start DPO chapter, improve intro **20-25 Feb. 2025**: Improve SEO, add IFT chapter, minor edits **10-15 Feb. 2025**: RM additions, preference data, cleaning, policy gradient finalization **8 Feb. 2025**: RM additions, editing, cleaning **4 Feb. 2025**: PPO and GAE **2 Feb. 2025**: Added changelog, revamped introduction, ## Acknowledgements I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, Valentina Pyatkin, Daniel Han, Shane Gu, Joanne Jang, LJ Miranda, and others in my RL sphere. Additionally, thank you to the [contributors on GitHub](https://github.com/natolambert/rlhf-book/graphs/contributors) who helped improve this project.
Resource ID:
ebcbaba2d260e656 | Stable ID: MmI0NDE1OW