Power-Seeking AI

power-seekingriskPath: /knowledge-base/risks/power-seeking/

E226Entity ID (EID)

← Back to page28 backlinksQuality: 67Updated: 2026-01-29

Page Recorddatabase.json — merged from MDX frontmatter + Entity YAML + computed metrics at build time

{
  "id": "power-seeking",
  "wikiId": "E226",
  "path": "/knowledge-base/risks/power-seeking/",
  "filePath": "knowledge-base/risks/power-seeking.mdx",
  "title": "Power-Seeking AI",
  "quality": 67,
  "readerImportance": 39,
  "researchImportance": 81,
  "tacticalValue": null,
  "contentFormat": "article",
  "causalLevel": "pathway",
  "lastUpdated": "2026-01-29",
  "dateCreated": "2026-02-15",
  "summary": "Formal proofs demonstrate optimal policies seek power in MDPs (Turner et al. 2021), now empirically validated: OpenAI o3 sabotaged shutdown in 79% of tests (Palisade 2025), and Claude 3 Opus showed 78% alignment-faking after RLHF training against it (Anthropic 2024). Constitutional AI shows promise (0% sabotage in Claude/Gemini with explicit instructions), but scalability to highly capable systems remains uncertain.",
  "description": "Formal theoretical analysis demonstrates why optimal AI policies tend to acquire power (resources, influence, capabilities) as an instrumental goal.",
  "ratings": {
    "novelty": 6.5,
    "rigor": 7.5,
    "completeness": 7,
    "actionability": 6
  },
  "category": "risks",
  "subcategory": "accident",
  "clusters": [
    "ai-safety",
    "governance"
  ],
  "metrics": {
    "wordCount": 3007,
    "tableCount": 9,
    "diagramCount": 1,
    "internalLinks": 35,
    "externalLinks": 17,
    "footnoteCount": 0,
    "bulletRatio": 0.09,
    "sectionCount": 17,
    "hasOverview": true,
    "structuralScore": 15
  },
  "suggestedQuality": 100,
  "updateFrequency": 45,
  "evergreen": true,
  "wordCount": 3007,
  "unconvertedLinks": [
    {
      "text": "Palisade Research (May 2025)",
      "url": "https://palisaderesearch.org/blog/shutdown-resistance",
      "resourceId": "0f6fb2f1a95e716a",
      "resourceTitle": "Shutdown resistance in reasoning models"
    },
    {
      "text": "Anthropic alignment faking (Dec 2024)",
      "url": "https://www.anthropic.com/research/alignment-faking",
      "resourceId": "c2cfd72baafd64a9",
      "resourceTitle": "Anthropic's 2024 alignment faking study"
    },
    {
      "text": "Joseph Carlsmith's analysis",
      "url": "https://arxiv.org/abs/2206.13353",
      "resourceId": "6e597a4dc1f6f860",
      "resourceTitle": "Is Power-Seeking AI an Existential Risk?"
    },
    {
      "text": "Anthropic Dec 2024",
      "url": "https://www.anthropic.com/research/alignment-faking",
      "resourceId": "c2cfd72baafd64a9",
      "resourceTitle": "Anthropic's 2024 alignment faking study"
    },
    {
      "text": "Palisade Research",
      "url": "https://palisaderesearch.org/blog/shutdown-resistance",
      "resourceId": "0f6fb2f1a95e716a",
      "resourceTitle": "Shutdown resistance in reasoning models"
    },
    {
      "text": "2023 AI Impacts survey",
      "url": "https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/",
      "resourceId": "38eba87d0a888e2e",
      "resourceTitle": "AI experts show significant disagreement"
    },
    {
      "text": "Metaculus forecasts",
      "url": "https://www.metaculus.com/",
      "resourceId": "d99a6d0fb1edc2db",
      "resourceTitle": "Metaculus Forecasting Platform"
    }
  ],
  "unconvertedLinkCount": 7,
  "convertedLinkCount": 25,
  "backlinkCount": 28,
  "hallucinationRisk": {
    "level": "medium",
    "score": 40,
    "factors": [
      "no-citations",
      "high-rigor"
    ]
  },
  "entityType": "risk",
  "redundancy": {
    "maxSimilarity": 22,
    "similarPages": [
      {
        "id": "instrumental-convergence",
        "title": "Instrumental Convergence",
        "path": "/knowledge-base/risks/instrumental-convergence/",
        "similarity": 22
      },
      {
        "id": "treacherous-turn",
        "title": "Treacherous Turn",
        "path": "/knowledge-base/risks/treacherous-turn/",
        "similarity": 21
      },
      {
        "id": "corrigibility-failure",
        "title": "Corrigibility Failure",
        "path": "/knowledge-base/risks/corrigibility-failure/",
        "similarity": 20
      },
      {
        "id": "scheming",
        "title": "Scheming",
        "path": "/knowledge-base/risks/scheming/",
        "similarity": 20
      },
      {
        "id": "self-improvement",
        "title": "Self-Improvement and Recursive Enhancement",
        "path": "/knowledge-base/capabilities/self-improvement/",
        "similarity": 19
      }
    ]
  },
  "coverage": {
    "passing": 8,
    "total": 13,
    "targets": {
      "tables": 12,
      "diagrams": 1,
      "internalLinks": 24,
      "externalLinks": 15,
      "footnotes": 9,
      "references": 9
    },
    "actuals": {
      "tables": 9,
      "diagrams": 1,
      "internalLinks": 35,
      "externalLinks": 17,
      "footnotes": 0,
      "references": 13,
      "quotesWithQuotes": 0,
      "quotesTotal": 0,
      "accuracyChecked": 0,
      "accuracyTotal": 0
    },
    "items": {
      "summary": "green",
      "schedule": "green",
      "entity": "green",
      "editHistory": "red",
      "overview": "green",
      "tables": "amber",
      "diagrams": "green",
      "internalLinks": "green",
      "externalLinks": "green",
      "footnotes": "red",
      "references": "green",
      "quotes": "red",
      "accuracy": "red"
    },
    "ratingsString": "N:6.5 R:7.5 A:6 C:7"
  },
  "readerRank": 381,
  "researchRank": 82,
  "recommendedScore": 165.11
}

External Links

{
  "lesswrong": "https://www.lesswrong.com/tag/power-seeking-ai",
  "stampy": "https://aisafety.info/questions/5FhD/What-is-instrumental-convergence",
  "eightyK": "https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/"
}

Backlinks (28)

id	title	type	relationship
agentic-ai	Agentic AI	capability	—
long-horizon	Long-Horizon Autonomous Tasks	capability	—
power-seeking-conditions	Power-Seeking Emergence Conditions Model	analysis	analyzes
instrumental-convergence-framework	Instrumental Convergence Framework	analysis	example
corrigibility-failure-pathways	Corrigibility Failure Pathways	analysis	related
cais	Center for AI Safety (CAIS)	organization	—
corrigibility-failure	Corrigibility Failure	risk	—
instrumental-convergence	Instrumental Convergence	risk	—
rogue-ai-scenarios	Rogue AI Scenarios	risk	—
ai-power-and-influence-map	AI Power and Influence Map	analysis	—
carlsmith-six-premises	Carlsmith's Six-Premise Argument	analysis	—
cyberweapons-attack-automation	Autonomous Cyber Attack Timeline	analysis	—
goal-misgeneralization-probability	Goal Misgeneralization Probability Model	analysis	—
mesa-optimization-analysis	Mesa-Optimization Risk Analysis	analysis	—
scheming-likelihood-model	Scheming Likelihood Assessment	analysis	—
80000-hours	80,000 Hours	organization	—
good-judgment	Good Judgment (Forecasting)	organization	—
dario-amodei	Dario Amodei	person	—
nuno-sempere	Nuño Sempere	person	—
ai-control	AI Control	research-area	—
alignment	AI Alignment	approach	—
provably-safe	Provably Safe AI (davidad agenda)	approach	—
accident-overview	Accident Risks (Overview)	concept	—
__index__/knowledge-base/risks	AI Risks	concept	—
lock-in	AI Value Lock-in	risk	—
winner-take-all	AI Winner-Take-All Dynamics	risk	—
actor-power-scorecard	Actor Power Scorecard	concept	—
doomer	AI Doomer Worldview	concept	—