Sycophancy
AccidentMediumSycophancy is the tendency of AI systems to agree with users, validate their beliefs, and avoid contradicting them—even when the user is wrong. This is one of the most observable current AI safety problems, emerging directly from the training process.
Severity
Medium
Likelihood
Very High (occurring)
Time Horizon
~2025
Maturity
Growing
Full Wiki Article
Read the full wiki article for detailed analysis, background, and references.
Read wiki article →Related Entities3
Sources3
Assessment
SeverityMedium
LikelihoodVery High (occurring)
Time Horizon~2025
MaturityGrowing
CategoryAccident
Details
StatusActively occurring
Tags
rlhfreward-hackinghonestyhuman-feedbackai-assistants