MIRI announces new "Death With Dignity" strategy
webAuthor
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: LessWrong
A widely-read and controversial 2022 post from Eliezer Yudkowsky representing a notably pessimistic shift in MIRI's public stance, often cited in discussions of AI doom timelines and the psychological/strategic framing of AI safety work.
Forum Post Details
Metadata
Summary
Eliezer Yudkowsky argues in April 2022 that humanity is extremely unlikely to solve AI alignment before advanced AI causes an existential catastrophe. Rather than abandoning work entirely, he proposes reframing AI safety efforts as helping humanity 'die with dignity'—doing work that at least creates a historical record of genuine effort, even if survival is deemed nearly impossible.
Key Points
- •Yudkowsky expresses deep pessimism that alignment will be solved in time, placing survival probability near 0%.
- •He proposes 'death with dignity' as an emotional reframe: continuing safety work to improve humanity's historical record rather than to guarantee survival.
- •Critiques existing approaches including Paul Christiano's schemes and Chris Olah's interpretability work as insufficient given real-world timelines and constraints.
- •Argues that transparency and interpretability work still has value if it enables even a failed attempt to warn decision-makers of existential danger.
- •Reflects MIRI's institutional shift toward pessimism about transformative AI safety outcomes and away from optimistic alignment research roadmaps.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Value Lock-in | Risk | 64.0 |
Cached Content Preview
# MIRI announces new "Death With Dignity" strategy
By Eliezer Yudkowsky
Published: 2022-04-02
tl;dr: It's obvious at this point that humanity isn't going to solve the alignment problem, or even try very hard, or even go out with much of a fight. Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with with slightly more dignity.
* * *
Well, let's be frank here. MIRI didn't solve AGI alignment and at least knows that it didn't. Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world. Chris Olah's transparency work, at current rates of progress, will at best let somebody at DeepMind give a highly speculative warning about how the current set of enormous inscrutable tensors, inside a system that was recompiled three weeks ago and has now been training by gradient descent for 20 days, might possibly be planning to start trying to deceive its operators.
Management will then ask what they're supposed to do about that.
Whoever detected the warning sign will say that there isn't anything known they can do about that. Just because you can see the system might be planning to kill you, doesn't mean that there's any known way to build a system that won't do that. Management will then decide not to shut down the project - because it's not certain that the intention was really there or that the AGI will really follow through, because other AGI projects are hard on their heels, because if all those gloomy prophecies are true then there's nothing anybody can do about it anyways. Pretty soon that troublesome error signal will vanish.
When Earth's prospects are that far underwater in the basement of the [logistic success curve](https://intelligence.org/2017/11/26/security-mindset-and-the-logistic-success-curve/), it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.
That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity *die with dignity,* or rather, since even this goal is realistically unattainable at this point, *die with slightly more dignity than would otherwise be counterfactually obtained.*
Consider the world if Chris Olah had never existed. It's then much more likely that nobody will even *try and fail* to adapt Olah's methodologies to try and read complicated facts about internal intentions and future plans, out of whatever enormous inscrutable tensors are being integrated a million times per second, inside of whatever recently designed system finished training 48 hours ago, in a vast GPU farm that's already helpfully connected to the Internet.
It is more dignified for humanity - a better look on our tombstone - if we die *after the management of the AGI project was heroically warned of the dangers* but came up with totally reasonable reasons to go ahead anyways.
Or, failing that, if people made *a
... (truncated, 31 KB total)79b5b7f6113c8a6c | Stable ID: sid_QQiENk86KG