Longterm Wiki

Sycophancy

AccidentMedium

Sycophancy is the tendency of AI systems to agree with users, validate their beliefs, and avoid contradicting them—even when the user is wrong. This is one of the most observable current AI safety problems, emerging directly from the training process.

Severity
Medium
Likelihood
Very High (occurring)
Time Horizon
~2025
Maturity
Growing

Full Wiki Article

Read the full wiki article for detailed analysis, background, and references.

Read wiki article →

Related Entities3

organization
safety-agenda

Sources3

Assessment

SeverityMedium
LikelihoodVery High (occurring)
Time Horizon~2025
MaturityGrowing
CategoryAccident

Details

StatusActively occurring

Tags

rlhfreward-hackinghonestyhuman-feedbackai-assistants

Quick Links