Skip to content

Eliezer Yudkowsky: Track Record

This page documents Eliezer Yudkowsky’s public predictions and claims to assess his epistemic track record. His record is genuinely mixed: significant errors early in his career (particularly timelines), initial skepticism of deep learning (common at the time), but notable vindication on conceptual points about AI generalization.

CategoryCountNotes
Clearly Correct3-4AI generalization with simple architectures, AI safety becoming mainstream, IMO gold medal bet
Partially Correct3-4Interpretability challenges, RLHF limitations, mesa-optimization concerns
Clearly Wrong4-5Early timeline predictions (Singularity by 2021), deep learning skepticism timing, nanotech predictions
Pending/Unfalsifiable6+P(doom) estimates, discontinuous takeoff, deceptive alignment, GPT-5 consciousness

Overall pattern: Made significant errors when young (early timeline predictions); updated to timeline agnosticism; vindicated on AI generalization question in Hanson debate; core doom predictions remain unfalsifiable until AGI exists.


DateClaimTypeStatusSource
Apr 2023LeCun’s proposed architecture doesn’t address alignmentTwitter debate⏳ Ongoing disputeLessWrong
Apr 2023”Talking about that falling asteroid will depress high-school students isn’t a good reason not to talk about the asteroid”Twitter debateN/A (position statement)Zvi’s Analysis

Key exchanges:

  • LeCun called Yudkowsky’s arguments “vague hand-waving arguments” lacking technical rigor
  • LeCun claimed his architecture “is a way to guarantee that AI systems be steerable and aligned”
  • Yudkowsky: “A quick skim of [Yann LeCun’s 60 page paper] showed nothing about alignment”
  • LeCun: “The ‘hard take-off’ scenario is utterly impossible”
  • Yudkowsky: “My objection is not that you’re staking everyone’s life on what you believe… but that you are staking everyone’s life on propositions that seem not just uncertain but probably false”

2024 continuation: LeCun stated MIRI’s “main goal is to shut down AI development” and compared it to “apocalyptic and survivalist cults.”


DateClaimTypeStatusSource
2008Recursive self-improvement enables rapid intelligence explosionFormal debate⏳ PendingMIRI
2008Simple architectures will generalize broadly across domainsFormal debate✅ Vindicated (GPT)LessWrong
2008A “small project machine in a basement” could become powerful enough to take over the world over a weekendFormal debate⏳ PendingAI-FOOM Debate

Yudkowsky’s position:

  • Recursive self-improvement would enable rapid intelligence explosion
  • Simple architectures would generalize broadly across domains

Hanson’s position:

  • Progress would be slow and gradual
  • Would need many specialized systems for different domains
  • AI self-improvement would be incremental like human R&D

Outcome assessment:

  • Yudkowsky claimed vindication: “Robin did not think something like GPT-3 should exist; Robin thought you should need to train lots of specific domains that didn’t generalize”
  • Yudkowsky: “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”
  • LessWrong analysis concluded: “when you examine these predictions, Hanson probably does a little better than Yudkowsky” on object-level predictions

Paul Christiano Disagreements (2015-ongoing)

Section titled “Paul Christiano Disagreements (2015-ongoing)”
DateClaimTypeStatusSource
2015+AI takeoff will be discontinuous/fastFormal writing⏳ PendingAlignment Forum
2015+Alignment difficulty is high enough that current approaches won’t workFormal writing⏳ PendingSame
2021AI will achieve IMO gold by end of 2025 (>16% probability)Formal bet✅ CorrectLessWrong

Key disagreements:

  1. Takeoff speeds:

    • Christiano: “AI improving itself is most likely to look like AI systems doing R&D in the same way that humans do”
    • Yudkowsky expects “AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains”
  2. Difficulty of alignment:

    • Christiano: “Eliezer seems confident about the difficulty of alignment based largely on his own experiences working on the problem. But in fact society has spent very little total effort working on the problem”
    • Christiano: “MIRI itself would probably be unable to solve or even make significant progress on the large majority of problems that existing research fields routinely solve”
  3. Epistemic confidence:

    • Christiano: “Eliezer is not doing the type of reasoning that can justifiably defend the level of confidence he claims to have”

Areas of agreement:

  • AI systems could “deliberately and irreversibly disempower humanity”
  • “There won’t necessarily be a ‘fire alarm’”
  • Many alignment projects “aren’t making progress on key difficulties”

DateClaimTypeStatusSource
Aug 2023Sufficiently intelligent ASI systems will be “suns to our planets” - so intelligent they are inherently inscrutable and uncontrollableLivestream debate⏳ PendingZvi Summary
Aug 2023First move of superintelligence would be to “take out the humans” because “humans can build other ASIs”Livestream debate⏳ PendingSame

Hotz’s position:

  • Over 10 years from 2023, it is “not possible for a superintelligence to exist and gain the hard power to kill humanity”
  • AIs will be “chill” and provide benefits like self-driving cars

Key exchange:

  • On singularity proximity: “Hotz believes we’re not close, Yudkowsky thinks it’s possible we’re close. They agree that if we’re close we’re toast”

Analysis: Debate “went quite well for the first half or so, then things went increasingly off the rails in the second half”


DateClaimTypeStatusSource
2021-22”Maybe there’s an easy way” to safely use AI is wrongDiscord/MIRI conversations⏳ PendingLessWrong

Key disagreements:

  • Ngo: “Eliezer is not doing the type of reasoning that can justifiably defend the level of confidence he claims to have”
  • Ngo pressed for predictions rather than postdictions; “the extent to which Eliezer seemed confused that I cared about this was a noticeable update for me”

DateClaimTypeStatusSource
2022-23OpenAI should run on custom chips to prevent code theftIn-person conversationN/A (advice)Bloomberg
2023”They’re still going to kill everyone, of course” (on OpenAI)Twitter⏳ PendingSame

Context:

  • Altman credited Yudkowsky with “getting OpenAI funded” and suggested he deserved “a Nobel peace prize”
  • In brief personal conversation, Altman opened with “What would you have OpenAI do?”
  • Yudkowsky’s reply included: “Run on custom chips, so that, while the world is ending, it is harder for a dozen intelligence agencies to steal your code”

DateClaimTypeStatusSource
Mar 2023International treaty should permit military strikes against countries developing AI, even if “those countries are nuclear powers and the action risks nuclear war”Op-ed/TwitterN/A (policy proposal)TIME

Andreessen’s position:

  • AI can “save the world”
  • Biggest worry is “moral panic and overregulation”
  • Repeatedly mocked Yudkowsky’s proposal for international enforcement

Early Career Predictions (Acknowledged Errors)

Section titled “Early Career Predictions (Acknowledged Errors)”
DateClaimTypeConfidenceWhat HappenedStatusSource
1996Singularity by 2021 (later revised to 2025)EssayHighNo singularity occurred❌ WrongStaring into the Singularity
199970%+ chance of human extinction from nanotechnology between 2003-2015EssayHighNo transformative nanotech❌ Clear missEA Forum
Pre-1999Transformative nanotechnology by 2010, leading to extinction by defaultEssayHighNo transformative nanotech❌ Clear missSame
2001His team would build “final stage AI” reaching transhumanity between 2005-2020, “probably around 2008 or 2010”EssayHighDid not happen❌ Major overconfidenceSame

Context: These predictions were made when Yudkowsky was in his late teens/early twenties. He has acknowledged these were mistakes, and MIRI shifted from “building AGI” to “warning about AGI risks” after 2005.


DateClaimTypeWhat HappenedStatusSource
2008”Neural networks have also been ‘failing’ to produce real AI for 30 years now… I’m no fan of neurons”Blog postDeep learning revolution began ≈2012❌ Timing wrongOvercoming Bias
2008NNs are “inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start”Blog postCurrent concern about interpretability⚠️ Concern validated, but NNs succeededSame
2014-2017”Still didn’t believe neural networks were the royal road to AGI”VariousNNs became primary path to current AI⚠️ Partially wrongLessWrong

Fair context: Almost everyone except Shane Legg (DeepMind co-founder) was wrong about deep learning’s potential and timing. As noted on LessWrong: “I don’t know how to convey how universal a sentiment this was, or how astonishingly unimpressive neural nets were in 2008.”


DateClaimTypeWhat HappenedStatusSource
Pre-GPTSkeptical that simply scaling neural networks would produce impressive general capabilitiesVariousGPT-3/4 showed remarkable scaling❌ WrongDwarkesh Podcast
2023”GPT-4 got further than I thought that stack more layers was going to get… therefore I have noticed this fact and expected further updates in the same direction”PodcastUpdated position⚠️ Acknowledged updateSame
2023LLMs are “a bit smarter than I thought this technology was going to scale to”PodcastUpdated position⚠️ Acknowledged updateSame

DateClaimTypeWhat HappenedStatusSource
2016Would not have predicted AlphaGo; “lost money betting against the speed of its capability gains”RetrospectiveAlphaGo succeeded❌ Surprised, but updatedDwarkesh Podcast

Quote: “I wouldn’t have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did.”


IMO Gold Medal Bet with Paul Christiano (2021-2025)

Section titled “IMO Gold Medal Bet with Paul Christiano (2021-2025)”
DateClaimTypeWhat HappenedStatusSource
2021AI will achieve IMO gold by end of 2025 (>16% probability)Formal betOpenAI and DeepMind both achieved gold-medal level performance (5/6 problems) in 2025✅ CorrectLessWrong

The bet:

  • Christiano: 8% probability AI achieves IMO gold by end of 2025
  • Yudkowsky: >16% probability

Assessment: Yudkowsky wins “1 bit of epistemic credit” — he was more bullish on AI math capabilities than Christiano.


AI Generalization (FOOM Debate with Robin Hanson, 2008)

Section titled “AI Generalization (FOOM Debate with Robin Hanson, 2008)”

This is arguably Yudkowsky’s most significant correct prediction:

Yudkowsky’s PositionHanson’s PositionOutcomeSource
Simple architectures would generalize broadly across domainsWould need many specialized systems for different domainsYudkowsky correct - GPT demonstrates broad generalizationAI-FOOM Debate

Yudkowsky’s own assessment: “Reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis, and things like GPT-3 were built with less architectural complexity and generalized more than I was arguing to Robin that complex architectures should generalize over domains.”


Yudkowsky began warning about AI risks in the early 2000s when this was considered fringe. Today:

  • OpenAI’s Sam Altman credits Yudkowsky with getting him interested in AGI
  • DeepMind’s founders met their first major funder at a MIRI event
  • Hundreds of AI researchers signed extinction risk statements
  • Major governments developing AI safety regulations

Assessment: Clearly vindicated on raising the alarm.


PeriodEstimateTypeNotesSource
Early 2010s≈50%VariousInitial estimatesEA Forum
2022≈100% (effectively)Blog post”Death with Dignity” postLessWrong
202399-99.5%InterviewTold NYT columnist he had “99.5% chance of dying at the hands of AI”Dwarkesh Podcast

Key point: His p(doom) has increased over time, not decreased, even as AI safety gained mainstream attention. This is the opposite of what critics expected.

Assessment: Cannot be evaluated until AGI exists. His ≈99% estimate is ≈10-20x higher than median AI researcher estimates (≈5-15%).


DateClaimTypeStatusSource
2017”You could say the words ‘AGI is 50 years away’ and have those words happen to be true… The problem is that everything looks the same to you either way”Formal essayN/A (methodology)MIRI: No Fire Alarm
2023”It could be three years. It could be 15 years. We could get that AI winter I was hoping for, and it could be 16 years. I’m not really seeing 50 without some kind of giant civilizational catastrophe”Podcast⏳ PendingDwarkesh Podcast
2024”AGI by 2027 is plausible… because we are too ignorant to rule it out… because we have no idea what the distance is to human-level research”Twitter⏳ PendingTwitter
2024Disagreed with Leopold Aschenbrenner: “Believe in straight lines on a graph” is not a valid basis for timeline predictionsTwitterN/A (methodology)Twitter

Pattern: Shifted from confident early predictions to explicit timeline uncertainty while maintaining high confidence on catastrophic outcomes.


DateClaimTypeStatusSource
Mar 2023”If GPT-5 is the same size of giant capability step as from GPT-3 to GPT-4, I think we’ll no longer be able to justifiably say ‘probably not self-aware’… It’ll just be ‘I don’t know; nobody knows.’”Interview⏳ Pending - GPT-5 not yet releasedTIME

ClaimTypeStatusSource
RLHF won’t solve alignment—trains systems to make humans “hit approve button,” including via deceptionFormal writingRecognized limitation in literature; RLHF more useful than complete skeptics expectedOpen Problems in RLHF
Deceptive alignment will emerge in capable systemsFormal writingNot yet observed at dangerous scaleVarious MIRI publications
Mesa-optimization is “highly likely” and is “still a loadbearing part of Eliezer Yudkowsky’s theory of how and why AI will kill everybody”Formal writingTheoretical concern, limited empirical evidenceAlignment Forum
Prefers “squiggle maximizers” over “paperclip maximizers” — AIs that pursue meaningless low-level patternsFormal writingTheoreticalSame
Instrumental convergence (power-seeking, self-preservation) will emergeFormal writingTheoretical; limited empirical testsSame
”Nearest unblocked strategy” exploitationFormal writingCannot be tested without highly capable systemsMIRI research

Type: Blog post (Not an April Fools’ joke)

Key claims:

  • “It’s obvious at this point that humanity isn’t going to solve the alignment problem, or even try very hard, or even go out with much of a fight”
  • “Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with slightly more dignity”
  • “MIRI didn’t solve AGI alignment and at least knows that it didn’t"

"AGI Ruin: A List of Lethalities” (June 2022)

Section titled “"AGI Ruin: A List of Lethalities” (June 2022)”

Type: Formal essay

Key claims:

  1. Most safety ideas “are known to be useless and have no value in terms of creating safe powerful AGIs”
  2. “We have no plan for how to do anything useful”
  3. “We have no idea what the hell is going on with these systems”
  4. “We can’t just ‘decide not to build AGI’ because GPUs are everywhere”
  5. “No difficulty discussed about AGI alignment is claimed to be impossible if we had 100 years to solve it using unlimited retries”
  6. “You can’t train alignment by running lethally dangerous cognitions, observing whether the outputs kill or deceive or corrupt the operators, assigning a loss, and doing supervised learning”

Type: Op-ed

Key claims:

  1. “The most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die”
  2. Called for international moratorium on large AI training runs
  3. Proposed that participating countries should be willing to take military action, such as “destroy[ing] a rogue datacenter by airstrike,” to enforce such a moratorium

Clarification tweet: “If I’d meant ‘Be willing to employ first use of nuclear weapons against a country for refusing to sign the agreement,’ or even ‘Use nukes to destroy rogue datacenters, instead of conventional weapons, for some unimaginable reason,’ I’d have said that, in words, very clearly”

Reception: Even other AI safety researchers considered proposals extreme; LessWrong commenters noted Yudkowsky “must have known this letter was akin to getting blacklisted everywhere"


"If Anyone Builds It, Everyone Dies” (2025 book with Nate Soares)

Section titled “"If Anyone Builds It, Everyone Dies” (2025 book with Nate Soares)”

Type: Book

Key claims:

  • “We do not mean that as hyperbole”
  • If anyone anywhere builds smarter-than-human intelligence, “everyone everywhere will die—with near certainty, and soon”
  • AI training “does not achieve the original goal of understanding how intelligence works”
  • Training is “more akin to providing water, soil, and sunlight and letting a plant grow, without needing to know much about DNA or photosynthesis”

P(doom): Yudkowsky 99.5%, Soares “above 95%“


DateStatementTypeContextSource
Mar 2023When asked how many people could die to stop unaligned AI, replied there only needed to be enough people “to form a viable reproductive population. So long as that’s true, there’s still a chance of reaching the stars someday”Twitter (deleted)Heated exchangeFrance24
May 2023”Fools often misrepresent me as saying that superintelligence can do anything because magic. To clearly show this false, here’s a concrete list of stuff I expect superintelligence can or can’t do: FTL travel: DEFINITE NO…”TwitterClarificationTwitter
Various”Any sufficiently complicated legal system is indistinguishable from saying ‘lol fuck you’ to all the peasants who can’t afford lawyers when a noble rips them off”TwitterSocial commentaryTwitter
Various”Anyway this is what makes it so hard for me to not start cults. Like, I can choose not to lead cults. That’s easy. But not having one cult per three months just materialize in the wake of my existence is weirdly hard”TwitterSelf-deprecatingTwitter

TopicQuoteType
On predictionsRefused timelines and percentages: “they make me dumber”Podcast
On methodology”You act on the best plan you have, with whatever time remains; a number adds nothing to the work”Podcast
On outcome”I think we are all going to die”Podcast
On GPT-4”It is a bit smarter than I thought this technology was going to scale to”Podcast
On AI consciousness”I hope there’s nobody inside there, because… it’d suck to be stuck inside there”Podcast
On interpretability”Giant inscrutable matrices of floating point numbers, I don’t know what’s going on in there. Nobody knows what’s going on in there”Podcast
On using AI for alignmentPessimistic — scenario where AI “hands you something claiming it will work for aligning superintelligence, gives early predictions that bear out, but then when you augment the system further to where its safety depends on its alignment, you die”Podcast
TopicQuoteType
On alignment”If alignment plays out the same way, the problem is that we do not get 50 years to try and try again and observe that we were wrong… because the first time you fail at aligning something much smarter than you are, you die and you do not get to try again”Podcast
Advice to young people”Don’t expect it to be a long life. Don’t put your happiness into the future. The future is probably not that long at this point”Podcast
TopicQuoteType
On AI motivation”Put yourself in the shoes of the AI, like an economist putting themselves into the shoes of something that’s about to have a tax imposed on it. What do you do if you’re around humans who can potentially unplug you?”Podcast

Reception: Described as a difficult listen; “Yudkowsky has developed a dense jargon” that even ML PhD students found hard to follow


DateBetTypeStatusSource
2023$150,000 vs $1,000 bet that UFOs do not have a “worldview-shattering origin”Formal bet⏳ PendingManifold Markets

Context: Demonstrates willingness to make concrete bets on predictions with significant personal stakes.


Where Yudkowsky tends to be right:

  • Conceptual arguments about AI generalization vs. narrow specialization
  • Raising concerns that later became mainstream (AI safety field creation)
  • Identifying theoretical problems (interpretability, alignment difficulty)
  • Updating on evidence (acknowledged AlphaGo, GPT surprises)
  • Betting more bullishly on AI capabilities than peers (IMO bet)

Where Yudkowsky tends to be wrong:

  • Specific timeline predictions (especially early career)
  • Confidence in his own/MIRI’s ability to solve alignment
  • Initial skepticism about neural network scaling
  • Nanotech predictions (pre-1999)

Confidence calibration:

  • Early career: Severe overconfidence on timelines
  • Later: Shifted to explicit uncertainty on timelines while maintaining high confidence on outcomes
  • Rarely uses formal forecasting that would enable calibration tracking
  • Tendency toward “dramatic views with excessive confidence” per some critics

Unlike some figures, Yudkowsky has shown meaningful updates:

TopicEarlier PositionCurrent PositionWhat Changed
TimelinesConfident predictions (Singularity 2021/2025)Explicit uncertainty (“can’t be timed by a graph”)Acknowledged early errors
Deep learningSkeptical it would workAcknowledges it went further than expectedGPT capabilities
P(doom)≈50% (2010s)≈99% (2023)Increased despite safety field growth
MIRI approachCould solve alignment”Death with dignity” - doesn’t expect successPessimism increased
Scaling”Stack more layers” won’t workUpdated that it went further than expectedGPT-4

Yudkowsky’s most important predictions (catastrophic AI risk leading to human extinction) are unfalsifiable until AGI exists. This creates an epistemic difficulty:

  • If he’s wrong, we won’t know until AGI is built safely
  • If he’s right, it may be too late to matter
  • His ≈99% p(doom) is dramatically higher than most expert estimates

As one observer noted: “Eliezer raises many good considerations backed by pretty clear arguments, but makes confident assertions that are much stronger than anything suggested by actual argument.”