Eliezer Yudkowsky: Track Record

This page documents Eliezer Yudkowsky’s public predictions and claims to assess his epistemic track record. His record is genuinely mixed: significant errors early in his career (particularly timelines), initial skepticism of deep learning (common at the time), but notable vindication on conceptual points about AI generalization.

Summary Assessment

Category	Count	Notes
Clearly Correct	3-4	AI generalization with simple architectures, AI safety becoming mainstream, IMO gold medal bet
Partially Correct	3-4	Interpretability challenges, RLHF limitations, mesa-optimization concerns
Clearly Wrong	4-5	Early timeline predictions (Singularity by 2021), deep learning skepticism timing, nanotech predictions
Pending/Unfalsifiable	6+	P(doom) estimates, discontinuous takeoff, deceptive alignment, GPT-5 consciousness

Overall pattern: Made significant errors when young (early timeline predictions); updated to timeline agnosticism; vindicated on AI generalization question in Hanson debate; core doom predictions remain unfalsifiable until AGI exists.

Major Debates

Yann LeCun Twitter Debate (April 2023)

Date	Claim	Type	Status	Source
Apr 2023	LeCun’s proposed architecture doesn’t address alignment	Twitter debate	⏳ Ongoing dispute	LessWrong
Apr 2023	”Talking about that falling asteroid will depress high-school students isn’t a good reason not to talk about the asteroid”	Twitter debate	N/A (position statement)	Zvi’s Analysis

Key exchanges:

LeCun called Yudkowsky’s arguments “vague hand-waving arguments” lacking technical rigor
LeCun claimed his architecture “is a way to guarantee that AI systems be steerable and aligned”
Yudkowsky: “A quick skim of [Yann LeCun’s 60 page paper] showed nothing about alignment”
LeCun: “The ‘hard take-off’ scenario is utterly impossible”
Yudkowsky: “My objection is not that you’re staking everyone’s life on what you believe… but that you are staking everyone’s life on propositions that seem not just uncertain but probably false”

2024 continuation: LeCun stated MIRI’s “main goal is to shut down AI development” and compared it to “apocalyptic and survivalist cults.”

Robin Hanson FOOM Debate (2008-ongoing)

Date	Claim	Type	Status	Source
2008	Recursive self-improvement enables rapid intelligence explosion	Formal debate	⏳ Pending	MIRI
2008	Simple architectures will generalize broadly across domains	Formal debate	✅ Vindicated (GPT)	LessWrong
2008	A “small project machine in a basement” could become powerful enough to take over the world over a weekend	Formal debate	⏳ Pending	AI-FOOM Debate

Yudkowsky’s position:

Recursive self-improvement would enable rapid intelligence explosion
Simple architectures would generalize broadly across domains

Hanson’s position:

Progress would be slow and gradual
Would need many specialized systems for different domains
AI self-improvement would be incremental like human R&D

Outcome assessment:

Yudkowsky claimed vindication: “Robin did not think something like GPT-3 should exist; Robin thought you should need to train lots of specific domains that didn’t generalize”
Yudkowsky: “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”
LessWrong analysis concluded: “when you examine these predictions, Hanson probably does a little better than Yudkowsky” on object-level predictions

Paul Christiano Disagreements (2015-ongoing)

Date	Claim	Type	Status	Source
2015+	AI takeoff will be discontinuous/fast	Formal writing	⏳ Pending	Alignment Forum
2015+	Alignment difficulty is high enough that current approaches won’t work	Formal writing	⏳ Pending	Same
2021	AI will achieve IMO gold by end of 2025 (>16% probability)	Formal bet	✅ Correct	LessWrong

Key disagreements:

Takeoff speeds:
- Christiano: “AI improving itself is most likely to look like AI systems doing R&D in the same way that humans do”
- Yudkowsky expects “AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains”
Difficulty of alignment:
- Christiano: “Eliezer seems confident about the difficulty of alignment based largely on his own experiences working on the problem. But in fact society has spent very little total effort working on the problem”
- Christiano: “MIRI itself would probably be unable to solve or even make significant progress on the large majority of problems that existing research fields routinely solve”
Epistemic confidence:
- Christiano: “Eliezer is not doing the type of reasoning that can justifiably defend the level of confidence he claims to have”

Areas of agreement:

AI systems could “deliberately and irreversibly disempower humanity”
“There won’t necessarily be a ‘fire alarm’”
Many alignment projects “aren’t making progress on key difficulties”

George Hotz Debate (August 2023)

Date	Claim	Type	Status	Source
Aug 2023	Sufficiently intelligent ASI systems will be “suns to our planets” - so intelligent they are inherently inscrutable and uncontrollable	Livestream debate	⏳ Pending	Zvi Summary
Aug 2023	First move of superintelligence would be to “take out the humans” because “humans can build other ASIs”	Livestream debate	⏳ Pending	Same

Hotz’s position:

Over 10 years from 2023, it is “not possible for a superintelligence to exist and gain the hard power to kill humanity”
AIs will be “chill” and provide benefits like self-driving cars

Key exchange:

On singularity proximity: “Hotz believes we’re not close, Yudkowsky thinks it’s possible we’re close. They agree that if we’re close we’re toast”

Analysis: Debate “went quite well for the first half or so, then things went increasingly off the rails in the second half”

Richard Ngo Debates (2021-2022)

Date	Claim	Type	Status	Source
2021-22	”Maybe there’s an easy way” to safely use AI is wrong	Discord/MIRI conversations	⏳ Pending	LessWrong

Key disagreements:

Ngo: “Eliezer is not doing the type of reasoning that can justifiably defend the level of confidence he claims to have”
Ngo pressed for predictions rather than postdictions; “the extent to which Eliezer seemed confused that I cared about this was a noticeable update for me”

Sam Altman Exchanges (2022-2023)

Date	Claim	Type	Status	Source
2022-23	OpenAI should run on custom chips to prevent code theft	In-person conversation	N/A (advice)	Bloomberg
2023	”They’re still going to kill everyone, of course” (on OpenAI)	Twitter	⏳ Pending	Same

Context:

Altman credited Yudkowsky with “getting OpenAI funded” and suggested he deserved “a Nobel peace prize”
In brief personal conversation, Altman opened with “What would you have OpenAI do?”
Yudkowsky’s reply included: “Run on custom chips, so that, while the world is ending, it is harder for a dozen intelligence agencies to steal your code”

Marc Andreessen Opposition (2023)

Date	Claim	Type	Status	Source
Mar 2023	International treaty should permit military strikes against countries developing AI, even if “those countries are nuclear powers and the action risks nuclear war”	Op-ed/Twitter	N/A (policy proposal)	TIME

Andreessen’s position:

AI can “save the world”
Biggest worry is “moral panic and overregulation”
Repeatedly mocked Yudkowsky’s proposal for international enforcement

Predictions: Resolved

Early Career Predictions (Acknowledged Errors)

Date	Claim	Type	Confidence	What Happened	Status	Source
1996	Singularity by 2021 (later revised to 2025)	Essay	High	No singularity occurred	❌ Wrong	Staring into the Singularity
1999	70%+ chance of human extinction from nanotechnology between 2003-2015	Essay	High	No transformative nanotech	❌ Clear miss	EA Forum
Pre-1999	Transformative nanotechnology by 2010, leading to extinction by default	Essay	High	No transformative nanotech	❌ Clear miss	Same
2001	His team would build “final stage AI” reaching transhumanity between 2005-2020, “probably around 2008 or 2010”	Essay	High	Did not happen	❌ Major overconfidence	Same

Context: These predictions were made when Yudkowsky was in his late teens/early twenties. He has acknowledged these were mistakes, and MIRI shifted from “building AGI” to “warning about AGI risks” after 2005.

Neural Networks and Deep Learning

Date	Claim	Type	What Happened	Status	Source
2008	”Neural networks have also been ‘failing’ to produce real AI for 30 years now… I’m no fan of neurons”	Blog post	Deep learning revolution began ≈2012	❌ Timing wrong	Overcoming Bias
2008	NNs are “inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start”	Blog post	Current concern about interpretability	⚠️ Concern validated, but NNs succeeded	Same
2014-2017	”Still didn’t believe neural networks were the royal road to AGI”	Various	NNs became primary path to current AI	⚠️ Partially wrong	LessWrong

Fair context: Almost everyone except Shane Legg (DeepMind co-founder) was wrong about deep learning’s potential and timing. As noted on LessWrong: “I don’t know how to convey how universal a sentiment this was, or how astonishingly unimpressive neural nets were in 2008.”

Scaling / “Stack More Layers”

Date	Claim	Type	What Happened	Status	Source
Pre-GPT	Skeptical that simply scaling neural networks would produce impressive general capabilities	Various	GPT-3/4 showed remarkable scaling	❌ Wrong	Dwarkesh Podcast
2023	”GPT-4 got further than I thought that stack more layers was going to get… therefore I have noticed this fact and expected further updates in the same direction”	Podcast	Updated position	⚠️ Acknowledged update	Same
2023	LLMs are “a bit smarter than I thought this technology was going to scale to”	Podcast	Updated position	⚠️ Acknowledged update	Same

AlphaGo (2016) - Acknowledged Surprise

Date	Claim	Type	What Happened	Status	Source
2016	Would not have predicted AlphaGo; “lost money betting against the speed of its capability gains”	Retrospective	AlphaGo succeeded	❌ Surprised, but updated	Dwarkesh Podcast

Quote: “I wouldn’t have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did.”

IMO Gold Medal Bet with Paul Christiano (2021-2025)

Date	Claim	Type	What Happened	Status	Source
2021	AI will achieve IMO gold by end of 2025 (>16% probability)	Formal bet	OpenAI and DeepMind both achieved gold-medal level performance (5/6 problems) in 2025	✅ Correct	LessWrong

The bet:

Christiano: 8% probability AI achieves IMO gold by end of 2025
Yudkowsky: >16% probability

Assessment: Yudkowsky wins “1 bit of epistemic credit” — he was more bullish on AI math capabilities than Christiano.

Predictions: Vindicated

AI Generalization (FOOM Debate with Robin Hanson, 2008)

This is arguably Yudkowsky’s most significant correct prediction:

Yudkowsky’s Position	Hanson’s Position	Outcome	Source
Simple architectures would generalize broadly across domains	Would need many specialized systems for different domains	Yudkowsky correct - GPT demonstrates broad generalization	AI-FOOM Debate

Yudkowsky’s own assessment: “Reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis, and things like GPT-3 were built with less architectural complexity and generalized more than I was arguing to Robin that complex architectures should generalize over domains.”

AI Safety Becoming Mainstream

Yudkowsky began warning about AI risks in the early 2000s when this was considered fringe. Today:

OpenAI’s Sam Altman credits Yudkowsky with getting him interested in AGI
DeepMind’s founders met their first major funder at a MIRI event
Hundreds of AI researchers signed extinction risk statements
Major governments developing AI safety regulations

Assessment: Clearly vindicated on raising the alarm.

Predictions: Pending/Unfalsifiable

P(doom) Evolution

Period	Estimate	Type	Notes	Source
Early 2010s	≈50%	Various	Initial estimates	EA Forum
2022	≈100% (effectively)	Blog post	”Death with Dignity” post	LessWrong
2023	99-99.5%	Interview	Told NYT columnist he had “99.5% chance of dying at the hands of AI”	Dwarkesh Podcast

Key point: His p(doom) has increased over time, not decreased, even as AI safety gained mainstream attention. This is the opposite of what critics expected.

Assessment: Cannot be evaluated until AGI exists. His ≈99% estimate is ≈10-20x higher than median AI researcher estimates (≈5-15%).

Timeline Predictions (Current Stance)

Date	Claim	Type	Status	Source
2017	”You could say the words ‘AGI is 50 years away’ and have those words happen to be true… The problem is that everything looks the same to you either way”	Formal essay	N/A (methodology)	MIRI: No Fire Alarm
2023	”It could be three years. It could be 15 years. We could get that AI winter I was hoping for, and it could be 16 years. I’m not really seeing 50 without some kind of giant civilizational catastrophe”	Podcast	⏳ Pending	Dwarkesh Podcast
2024	”AGI by 2027 is plausible… because we are too ignorant to rule it out… because we have no idea what the distance is to human-level research”	Twitter	⏳ Pending	Twitter
2024	Disagreed with Leopold Aschenbrenner: “Believe in straight lines on a graph” is not a valid basis for timeline predictions	Twitter	N/A (methodology)	Twitter

Pattern: Shifted from confident early predictions to explicit timeline uncertainty while maintaining high confidence on catastrophic outcomes.

GPT-5 and Consciousness Prediction

Date	Claim	Type	Status	Source
Mar 2023	”If GPT-5 is the same size of giant capability step as from GPT-3 to GPT-4, I think we’ll no longer be able to justifiably say ‘probably not self-aware’… It’ll just be ‘I don’t know; nobody knows.’”	Interview	⏳ Pending - GPT-5 not yet released	TIME

Technical Claims (Pending)

Claim	Type	Status	Source
RLHF won’t solve alignment—trains systems to make humans “hit approve button,” including via deception	Formal writing	Recognized limitation in literature; RLHF more useful than complete skeptics expected	Open Problems in RLHF
Deceptive alignment will emerge in capable systems	Formal writing	Not yet observed at dangerous scale	Various MIRI publications
Mesa-optimization is “highly likely” and is “still a loadbearing part of Eliezer Yudkowsky’s theory of how and why AI will kill everybody”	Formal writing	Theoretical concern, limited empirical evidence	Alignment Forum
Prefers “squiggle maximizers” over “paperclip maximizers” — AIs that pursue meaningless low-level patterns	Formal writing	Theoretical	Same
Instrumental convergence (power-seeking, self-preservation) will emerge	Formal writing	Theoretical; limited empirical tests	Same
”Nearest unblocked strategy” exploitation	Formal writing	Cannot be tested without highly capable systems	MIRI research

Major Publications and Key Claims

”Death with Dignity” (April 1, 2022)

Type: Blog post (Not an April Fools’ joke)

Key claims:

“It’s obvious at this point that humanity isn’t going to solve the alignment problem, or even try very hard, or even go out with much of a fight”
“Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with slightly more dignity”
“MIRI didn’t solve AGI alignment and at least knows that it didn’t"

"AGI Ruin: A List of Lethalities” (June 2022)

Type: Formal essay

Key claims:

Most safety ideas “are known to be useless and have no value in terms of creating safe powerful AGIs”
“We have no plan for how to do anything useful”
“We have no idea what the hell is going on with these systems”
“We can’t just ‘decide not to build AGI’ because GPUs are everywhere”
“No difficulty discussed about AGI alignment is claimed to be impossible if we had 100 years to solve it using unlimited retries”
“You can’t train alignment by running lethally dangerous cognitions, observing whether the outputs kill or deceive or corrupt the operators, assigning a loss, and doing supervised learning”

TIME Magazine Op-Ed (March 2023)

Type: Op-ed

Key claims:

“The most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die”
Called for international moratorium on large AI training runs
Proposed that participating countries should be willing to take military action, such as “destroy[ing] a rogue datacenter by airstrike,” to enforce such a moratorium

Clarification tweet: “If I’d meant ‘Be willing to employ first use of nuclear weapons against a country for refusing to sign the agreement,’ or even ‘Use nukes to destroy rogue datacenters, instead of conventional weapons, for some unimaginable reason,’ I’d have said that, in words, very clearly”

Reception: Even other AI safety researchers considered proposals extreme; LessWrong commenters noted Yudkowsky “must have known this letter was akin to getting blacklisted everywhere"

"If Anyone Builds It, Everyone Dies” (2025 book with Nate Soares)

Type: Book

Key claims:

“We do not mean that as hyperbole”
If anyone anywhere builds smarter-than-human intelligence, “everyone everywhere will die—with near certainty, and soon”
AI training “does not achieve the original goal of understanding how intelligence works”
Training is “more akin to providing water, soil, and sunlight and letting a plant grow, without needing to know much about DNA or photosynthesis”

P(doom): Yudkowsky 99.5%, Soares “above 95%“

Notable Statements and Quotes

Heated/Controversial Tweets

Date	Statement	Type	Context	Source
Mar 2023	When asked how many people could die to stop unaligned AI, replied there only needed to be enough people “to form a viable reproductive population. So long as that’s true, there’s still a chance of reaching the stars someday”	Twitter (deleted)	Heated exchange	France24
May 2023	”Fools often misrepresent me as saying that superintelligence can do anything because magic. To clearly show this false, here’s a concrete list of stuff I expect superintelligence can or can’t do: FTL travel: DEFINITE NO…”	Twitter	Clarification	Twitter
Various	”Any sufficiently complicated legal system is indistinguishable from saying ‘lol fuck you’ to all the peasants who can’t afford lawyers when a noble rips them off”	Twitter	Social commentary	Twitter
Various	”Anyway this is what makes it so hard for me to not start cults. Like, I can choose not to lead cults. That’s easy. But not having one cult per three months just materialize in the wake of my existence is weirdly hard”	Twitter	Self-deprecating	Twitter

Podcast Interviews - Key Quotes

Dwarkesh Patel Podcast (April 2023)

Topic	Quote	Type
On predictions	Refused timelines and percentages: “they make me dumber”	Podcast
On methodology	”You act on the best plan you have, with whatever time remains; a number adds nothing to the work”	Podcast
On outcome	”I think we are all going to die”	Podcast
On GPT-4	”It is a bit smarter than I thought this technology was going to scale to”	Podcast
On AI consciousness	”I hope there’s nobody inside there, because… it’d suck to be stuck inside there”	Podcast
On interpretability	”Giant inscrutable matrices of floating point numbers, I don’t know what’s going on in there. Nobody knows what’s going on in there”	Podcast
On using AI for alignment	Pessimistic — scenario where AI “hands you something claiming it will work for aligning superintelligence, gives early predictions that bear out, but then when you augment the system further to where its safety depends on its alignment, you die”	Podcast

Lex Fridman Podcast #368 (March 2023)

Topic	Quote	Type
On alignment	”If alignment plays out the same way, the problem is that we do not get 50 years to try and try again and observe that we were wrong… because the first time you fail at aligning something much smarter than you are, you die and you do not get to try again”	Podcast
Advice to young people	”Don’t expect it to be a long life. Don’t put your happiness into the future. The future is probably not that long at this point”	Podcast

EconTalk with Russ Roberts (May 2023)

Topic	Quote	Type
On AI motivation	”Put yourself in the shoes of the AI, like an economist putting themselves into the shoes of something that’s about to have a tax imposed on it. What do you do if you’re around humans who can potentially unplug you?”	Podcast

Reception: Described as a difficult listen; “Yudkowsky has developed a dense jargon” that even ML PhD students found hard to follow

UFO Bet

Date	Bet	Type	Status	Source
2023	$150,000 vs $1,000 bet that UFOs do not have a “worldview-shattering origin”	Formal bet	⏳ Pending	Manifold Markets

Context: Demonstrates willingness to make concrete bets on predictions with significant personal stakes.

Accuracy Analysis

Where Yudkowsky tends to be right:

Conceptual arguments about AI generalization vs. narrow specialization
Raising concerns that later became mainstream (AI safety field creation)
Identifying theoretical problems (interpretability, alignment difficulty)
Updating on evidence (acknowledged AlphaGo, GPT surprises)
Betting more bullishly on AI capabilities than peers (IMO bet)

Where Yudkowsky tends to be wrong:

Specific timeline predictions (especially early career)
Confidence in his own/MIRI’s ability to solve alignment
Initial skepticism about neural network scaling
Nanotech predictions (pre-1999)

Confidence calibration:

Early career: Severe overconfidence on timelines
Later: Shifted to explicit uncertainty on timelines while maintaining high confidence on outcomes
Rarely uses formal forecasting that would enable calibration tracking
Tendency toward “dramatic views with excessive confidence” per some critics

Position Evolution

Unlike some figures, Yudkowsky has shown meaningful updates:

Topic	Earlier Position	Current Position	What Changed
Timelines	Confident predictions (Singularity 2021/2025)	Explicit uncertainty (“can’t be timed by a graph”)	Acknowledged early errors
Deep learning	Skeptical it would work	Acknowledges it went further than expected	GPT capabilities
P(doom)	≈50% (2010s)	≈99% (2023)	Increased despite safety field growth
MIRI approach	Could solve alignment	”Death with dignity” - doesn’t expect success	Pessimism increased
Scaling	”Stack more layers” won’t work	Updated that it went further than expected	GPT-4

The Core Challenge

Yudkowsky’s most important predictions (catastrophic AI risk leading to human extinction) are unfalsifiable until AGI exists. This creates an epistemic difficulty:

If he’s wrong, we won’t know until AGI is built safely
If he’s right, it may be too late to matter
His ≈99% p(doom) is dramatically higher than most expert estimates

As one observer noted: “Eliezer raises many good considerations backed by pretty clear arguments, but makes confident assertions that are much stronger than anything suggested by actual argument.”

Eliezer Yudkowsky: Track Record

Summary Assessment

Major Debates

Yann LeCun Twitter Debate (April 2023)

Robin Hanson FOOM Debate (2008-ongoing)

Paul Christiano Disagreements (2015-ongoing)

George Hotz Debate (August 2023)

Richard Ngo Debates (2021-2022)

Sam Altman Exchanges (2022-2023)

Marc Andreessen Opposition (2023)

Predictions: Resolved

Early Career Predictions (Acknowledged Errors)

Neural Networks and Deep Learning

Scaling / “Stack More Layers”

AlphaGo (2016) - Acknowledged Surprise

IMO Gold Medal Bet with Paul Christiano (2021-2025)

Predictions: Vindicated

AI Generalization (FOOM Debate with Robin Hanson, 2008)

AI Safety Becoming Mainstream

Predictions: Pending/Unfalsifiable

P(doom) Evolution

Timeline Predictions (Current Stance)

GPT-5 and Consciousness Prediction

Technical Claims (Pending)

Major Publications and Key Claims

”Death with Dignity” (April 1, 2022)

"AGI Ruin: A List of Lethalities” (June 2022)

TIME Magazine Op-Ed (March 2023)

"If Anyone Builds It, Everyone Dies” (2025 book with Nate Soares)

Notable Statements and Quotes

Heated/Controversial Tweets

Podcast Interviews - Key Quotes

Dwarkesh Patel Podcast (April 2023)

Lex Fridman Podcast #368 (March 2023)

EconTalk with Russ Roberts (May 2023)

UFO Bet

Accuracy Analysis

Position Evolution

The Core Challenge

Sources