AI Risks
Overview
Section titled “Overview”This section documents the potential risks from advanced AI systems, organized into four major categories based on the source and nature of the risk.
Risk Categories
Section titled “Risk Categories”Unintended failures from AI systems pursuing misaligned goals:
- Scheming - AI strategically concealing misaligned goals
- Deceptive Alignment - Models appearing aligned during training
- Mesa-Optimization - Learned optimizers with misaligned objectives
- Goal Misgeneralization - Objectives that fail in deployment
- Power-Seeking - Instrumental convergence toward acquiring resources
Deliberate harmful applications of AI capabilities:
- Bioweapons - AI-assisted biological weapon development
- Cyberweapons - Automated cyber attacks and vulnerabilities
- Disinformation - Large-scale manipulation campaigns
- Autonomous Weapons - Lethal autonomous systems
Systemic issues from how AI development is organized:
- Racing Dynamics - Competitive pressure reducing safety investment
- Concentration of Power - Dangerous accumulation of AI capabilities
- Lock-in - Irreversible entrenchment of values or structures
- Economic Disruption - Labor market and economic instability
Threats to society’s ability to know and reason:
- Trust Decline - Erosion of institutional and interpersonal trust
- Authentication Collapse - Inability to verify authentic content
- Expertise Atrophy - Loss of human capability through AI dependence
How Risks Connect
Section titled “How Risks Connect”Many risks interact and compound. For example:
- Racing dynamics → reduced safety testing → higher accident risk
- Disinformation → trust decline → reduced coordination capacity
- Power concentration → lock-in potential → governance failures
See the Risk Interaction Matrix for detailed analysis.