What Failure Looks Like

blog

2019·Alignment Forum·alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-l...

Author

paulfchristiano

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Data Status

Not fetched

Cited by 2 pages

Page	Type	Quality
Paul Christiano	Person	39.0
AI Doomer Worldview	Concept	38.0

Cached Content Preview

HTTP 200Fetched Feb 26, 202665 KB

![Background Image](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/splashArtImagePromptA%20large%20digital%20clock%20unperturbed%20as%20a%20city%20implodes%20in%20the%20background/xlri1nbhfrevvtti7gik)

[What failure looks like](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#)

10 min read

•

[Part I: You get what you measure](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#Part_I__You_get_what_you_measure)

•

[Part II: influence-seeking behavior is scary](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#Part_II__influence_seeking_behavior_is_scary)

[Best of LessWrong 2019](https://www.alignmentforum.org/bestoflesswrong?year=2019&category=all)

[AI Risk](https://www.alignmentforum.org/w/ai-risk)[Threat Models (AI)](https://www.alignmentforum.org/w/threat-models-ai)[AI Takeoff](https://www.alignmentforum.org/w/ai-takeoff)[More Dakka](https://www.alignmentforum.org/w/more-dakka)[AI](https://www.alignmentforum.org/w/ai)[World Modeling](https://www.alignmentforum.org/w/world-modeling)[World Optimization](https://www.alignmentforum.org/w/world-optimization) [Curated](https://www.alignmentforum.org/recommendations)

# 106

# [What failure lookslike](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like)

by [paulfchristiano](https://www.alignmentforum.org/users/paulfchristiano?from=post_header)

17th Mar 2019

10 min read

[55](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#comments)

# 106

[Review by\\
\\
orthonormal](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#dFcfbCL5xW6SfPRqo)

The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.

I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:

- **Part I**: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")
- **Part II**: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of [optimization daemons](https://www.alignmentforum.org/w/daemons).)

I think these are the most important problems if we fail to solve [intent alignment](https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6).

In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several ye

... (truncated, 65 KB total)

Resource ID: 6807a8a8f2fd23f3 | Stable ID: OGFlNWM2Yz