Reinforcement Learning from Human Feedback
referenceCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Wikipedia
A solid introductory reference for understanding RLHF, the dominant alignment technique used in modern LLMs; useful for readers new to the field or seeking a broad overview before diving into primary research papers.
Metadata
Summary
Wikipedia's overview of Reinforcement Learning from Human Feedback (RLHF), a technique for training AI systems using human preference data as a reward signal. It covers the foundational concepts, history, and applications of RLHF, including its central role in aligning large language models like ChatGPT to human intentions. The article explains the process of collecting human feedback, training reward models, and fine-tuning AI systems via reinforcement learning.
Key Points
- •RLHF uses human preference judgments to train a reward model, which then guides RL-based fine-tuning of AI systems toward desired behaviors.
- •It has become a standard technique for aligning large language models (LLMs) such as ChatGPT, Claude, and others with human values and intentions.
- •The process involves three main steps: supervised fine-tuning, reward model training from human comparisons, and RL optimization (often using PPO).
- •RLHF can reduce harmful outputs and improve helpfulness, but is subject to reward hacking, feedback biases, and scalability challenges.
- •Variants and alternatives such as RLAIF, DPO, and Constitutional AI have been developed to address limitations of standard RLHF.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Why Alignment Might Be Hard | Argument | 69.0 |
Cached Content Preview
Reinforcement learning from human feedback - Wikipedia
Jump to content
From Wikipedia, the free encyclopedia
Machine learning technique
Part of a series on Machine learning
and data mining
Paradigms
Supervised learning
Unsupervised learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Meta-learning
Online learning
Batch learning
Curriculum learning
Rule-based learning
Neuro-symbolic AI
Neuromorphic engineering
Quantum machine learning
Problems
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis
Structured prediction
Feature engineering
Feature learning
Learning to rank
Grammar induction
Ontology learning
Multimodal learning
Supervised learning
( classification  •  regression )
Apprenticeship learning
Decision trees
Ensembles
Bagging
Boosting
Random forest
k -NN
Linear regression
Naive Bayes
Artificial neural networks
Logistic regression
Perceptron
Relevance vector machine (RVM)
Support vector machine (SVM)
Clustering
BIRCH
CURE
Hierarchical
k -means
Fuzzy
Expectation–maximization (EM)
DBSCAN
OPTICS
Mean shift
Dimensionality reduction
Factor analysis
CCA
ICA
LDA
NMF
PCA
PGD
t-SNE
SDL
Structured prediction
Graphical models
Bayes net
Conditional random field
Hidden Markov
Anomaly detection
RANSAC
k -NN
Local outlier factor
Isolation forest
Neural networks
Autoencoder
Deep learning
Feedforward neural network
Recurrent neural network
LSTM
GRU
ESN
reservoir computing
Boltzmann machine
Restricted
GAN
Diffusion model
SOM
Convolutional neural network
U-Net
LeNet
AlexNet
DeepDream
Neural field
Neural radiance field
Physics-informed neural networks
Transformer
Vision
Mamba
Spiking neural network
Memtransistor
Electrochemical RAM (ECRAM)
Reinforcement learning
Q-learning
Policy gradient
SARSA
Temporal difference (TD)
Multi-agent
Self-play
Learning with humans
Active learning
Crowdsourcing
Human-in-the-loop
Mechanistic interpretability
RLHF
Model diagnostics
Coefficient of determination
Confusion matrix
Learning curve
ROC curve
Mathematical foundations
Kernel machines
Bias–variance tradeoff
Computational learning theory
Empirical risk minimization
Occam learning
PAC learning
Statistical learning
VC theory
Topological deep learning
Journals and conferences
AAAI
ECML PKDD
NeurIPS
ICML
IC
... (truncated, 66 KB total)a665c398b96149c1 | Stable ID: N2EyOGQxYm