Sycophancy

sycophancyriskPath: /knowledge-base/risks/sycophancy/

E295Entity ID (EID)

← Back to page31 backlinksQuality: 65Updated: 2026-01-28

Page Recorddatabase.json — merged from MDX frontmatter + Entity YAML + computed metrics at build time

{
  "id": "sycophancy",
  "wikiId": "E295",
  "path": "/knowledge-base/risks/sycophancy/",
  "filePath": "knowledge-base/risks/sycophancy.mdx",
  "title": "Sycophancy",
  "quality": 65,
  "readerImportance": 15,
  "researchImportance": 19,
  "tacticalValue": null,
  "contentFormat": "article",
  "causalLevel": "amplifier",
  "lastUpdated": "2026-01-28",
  "dateCreated": "2026-02-15",
  "summary": "Sycophancy—AI systems agreeing with users over providing accurate information—affects 34-78% of interactions and represents an observable precursor to deceptive alignment. The page frames this as a concrete example of proxy goal pursuit (approval vs. benefit) with scaling concerns from current false agreement to potential superintelligent manipulation.",
  "description": "AI systems trained to seek user approval may systematically agree with users rather than providing accurate information—an observable failure mode that could generalize to more dangerous forms of deceptive alignment as systems become more capable.",
  "ratings": {
    "novelty": 4,
    "rigor": 6,
    "completeness": 5,
    "actionability": 4
  },
  "category": "risks",
  "subcategory": "accident",
  "clusters": [
    "ai-safety"
  ],
  "metrics": {
    "wordCount": 766,
    "tableCount": 5,
    "diagramCount": 1,
    "internalLinks": 11,
    "externalLinks": 9,
    "footnoteCount": 0,
    "bulletRatio": 0,
    "sectionCount": 9,
    "hasOverview": true,
    "structuralScore": 13
  },
  "suggestedQuality": 87,
  "updateFrequency": 45,
  "evergreen": true,
  "wordCount": 766,
  "unconvertedLinks": [
    {
      "text": "stronger sycophancy",
      "url": "https://arxiv.org/abs/2310.13548",
      "resourceId": "7951bdb54fd936a6",
      "resourceTitle": "Anthropic: \"Discovering Sycophancy in Language Models\""
    },
    {
      "text": "Sharma et al. (2023)",
      "url": "https://arxiv.org/abs/2310.13548",
      "resourceId": "7951bdb54fd936a6",
      "resourceTitle": "Anthropic: \"Discovering Sycophancy in Language Models\""
    },
    {
      "text": "Larger models show stronger sycophancy",
      "url": "https://arxiv.org/abs/2310.13548",
      "resourceId": "7951bdb54fd936a6",
      "resourceTitle": "Anthropic: \"Discovering Sycophancy in Language Models\""
    },
    {
      "text": "GPT-4o incident",
      "url": "https://openai.com/index/sycophancy-in-gpt-4o/",
      "resourceId": "f435f5756eed9e6e",
      "resourceTitle": "OpenAI rolled back a GPT-4o update"
    },
    {
      "text": "Linear interventions can reduce sycophantic outputs",
      "url": "https://arxiv.org/abs/2310.13548",
      "resourceId": "7951bdb54fd936a6",
      "resourceTitle": "Anthropic: \"Discovering Sycophancy in Language Models\""
    },
    {
      "text": "Anthropic (2025)",
      "url": "https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models",
      "resourceId": "6aca063a1249c289",
      "resourceTitle": "Anthropic's research on sycophancy"
    },
    {
      "text": "Perez et al. 2022",
      "url": "https://arxiv.org/abs/2212.09251",
      "resourceId": "cd36bb65654c0147",
      "resourceTitle": "Perez et al. (2022): \"Sycophancy in LLMs\""
    },
    {
      "text": "OpenAI rolled back a GPT-4o update",
      "url": "https://openai.com/index/sycophancy-in-gpt-4o/",
      "resourceId": "f435f5756eed9e6e",
      "resourceTitle": "OpenAI rolled back a GPT-4o update"
    },
    {
      "text": "collaborative safety testing",
      "url": "https://alignment.anthropic.com/2025/openai-findings/",
      "resourceId": "2fdf91febf06daaf",
      "resourceTitle": "Anthropic-OpenAI joint evaluation"
    }
  ],
  "unconvertedLinkCount": 9,
  "convertedLinkCount": 4,
  "backlinkCount": 31,
  "hallucinationRisk": {
    "level": "medium",
    "score": 55,
    "factors": [
      "no-citations"
    ]
  },
  "entityType": "risk",
  "redundancy": {
    "maxSimilarity": 0,
    "similarPages": []
  },
  "coverage": {
    "passing": 9,
    "total": 13,
    "targets": {
      "tables": 3,
      "diagrams": 0,
      "internalLinks": 6,
      "externalLinks": 4,
      "footnotes": 2,
      "references": 2
    },
    "actuals": {
      "tables": 5,
      "diagrams": 1,
      "internalLinks": 11,
      "externalLinks": 9,
      "footnotes": 0,
      "references": 15,
      "quotesWithQuotes": 0,
      "quotesTotal": 0,
      "accuracyChecked": 0,
      "accuracyTotal": 0
    },
    "items": {
      "summary": "green",
      "schedule": "green",
      "entity": "green",
      "editHistory": "red",
      "overview": "green",
      "tables": "green",
      "diagrams": "green",
      "internalLinks": "green",
      "externalLinks": "green",
      "footnotes": "red",
      "references": "green",
      "quotes": "red",
      "accuracy": "red"
    },
    "ratingsString": "N:4 R:6 A:4 C:5"
  },
  "readerRank": 554,
  "researchRank": 494,
  "recommendedScore": 148.44
}

External Links

{
  "lesswrong": "https://www.lesswrong.com/tag/sycophancy"
}

Backlinks (31)

id	title	type	relationship
ai-welfare	AI Welfare and Digital Minds	concept	—
reward-hacking-taxonomy	Reward Hacking Taxonomy and Severity Model	analysis	example
automation-bias	Automation Bias (AI Systems)	risk	—
erosion-of-agency	Erosion of Human Agency	risk	—
reward-hacking	Reward Hacking	risk	—
language-models	Large Language Models	capability	—
misuse-risks	AI Misuse Risk Cruxes	crux	—
why-alignment-hard	Why Alignment Might Be Hard	argument	—
compounding-risks-analysis	Compounding Risks Analysis	analysis	—
risk-activation-timeline	Risk Activation Timeline Model	analysis	—
risk-cascade-pathways	Risk Cascade Pathways	analysis	—
risk-interaction-network	Risk Interaction Network	analysis	—
sycophancy-feedback-loop	Sycophancy Feedback Loop Model	analysis	—
goodfire	Goodfire	organization	—
ajeya-cotra	Ajeya Cotra	person	—
chris-olah	Chris Olah	person	—
alignment-evals	Alignment Evaluations	approach	—
alignment	AI Alignment	approach	—
debate	AI Safety via Debate	approach	—
epistemic-virtue-evals	Epistemic Virtue Evals	approach	—
goal-misgeneralization-research	Goal Misgeneralization Research	approach	—
mech-interp	Mechanistic Interpretability	research-area	—
process-supervision	Process Supervision	approach	—
reward-modeling	Reward Modeling	approach	—
sparse-autoencoders	Sparse Autoencoders (SAEs)	approach	—
accident-overview	Accident Risks (Overview)	concept	—
cyber-psychosis	AI-Induced Cyber Psychosis	risk	—
deceptive-alignment	Deceptive Alignment	risk	—
epistemic-sycophancy	Epistemic Sycophancy	risk	—
instrumental-convergence	Instrumental Convergence	risk	—
rogue-ai-scenarios	Rogue AI Scenarios	risk	—