Eliezer Yudkowsky

Person

Eliezer Yudkowsky

Comprehensive biographical profile of Eliezer Yudkowsky covering his foundational contributions to AI safety (CEV, early problem formulation, agent foundations) and notably pessimistic views on AI risk. Includes detailed 'Statements & Track Record' section analyzing his mixed prediction accuracy—noting early timeline errors, his position on AI generalization in the Hanson debate, and the unfalsifiability of his core doom predictions.

Wikipedia LessWrong Wikidata Grokipedia

AffiliationMachine Intelligence Research Institute (MIRI)

RoleCo-founder & Research Fellow

Known ForEarly AI safety work, decision theory, rationalist community

ProfileView profile page

Websiteintelligence.org

Organizations

Risks

People

3.2k words · 58 backlinks

Key Links

Source	Link
Official Website	yudkowsky.net
Wikipedia	en.wikipedia.org
Wikidata	wikidata.org

Quick Assessment

Dimension	Assessment
Primary Role	Senior researcher and co-founder at MIRI; public advocate for AI risk policy
Key Contributions	Early articulation of the AI AI Alignment problem; Coherent Extrapolated Volition (2004); founding of LessWrong; extensive writing on decision theory and rationality
Key Publications	"AGI Ruin: A List of Lethalities" (2022); "Pausing AI Developments Isn't Enough. We Need to Shut it All Down" (TIME, 2023); Harry Potter and the Methods of Rationality (2010–2015); the Sequences (2006–2009)
Institutional Affiliation	Machine Intelligence Research Institute (MIRI), Berkeley, California
Influence on AI Safety	Among the earliest researchers to systematically formulate the AI alignment problem; his writings and community-building through LessWrong influenced a generation of AI safety researchers and, through them, major funders including Coefficient Giving (formerly Open Philanthropy)

Overview

Eliezer Yudkowsky (born September 11, 1979, in Chicago, Illinois) is an American AI researcher, writer, and co-founder of the Machine Intelligence Research Institute (MIRI).¹ He is a central figure in the development of AI safety as a research area, having begun writing on the risks of Superintelligence in the late 1990s, well before the field attracted mainstream academic or institutional attention.

Yudkowsky is entirely self-educated, having not attended high school or university.¹ His intellectual development emerged through independent study and participation in early online intellectual communities. This unconventional background shapes both his methodology—he has worked largely outside formal academic structures—and the reception of his work, which ranges from highly influential within the AI safety community to contested among mainstream AI researchers.

His views on AI risk have become more pessimistic over time. He has publicly argued that the default trajectory of AI development leads to human extinction, and his policy positions—including calls for an indefinite international moratorium on large AI training runs—have attracted significant media attention, particularly following a March 2023 op-ed in TIME magazine that was the most-viewed page on that site for a week.²

Background

Early Life and Self-Education

Yudkowsky was born on September 11, 1979, in Chicago, Illinois, and raised in a Jewish household.¹ He did not attend high school or university, describing himself as entirely self-educated. His intellectual development proceeded through independent reading and participation in early online communities, including mailing lists focused on transhumanism and rationalism in the mid-to-late 1990s.³

He became interested in artificial intelligence as a teenager after reading about efforts to enhance human capabilities through technology.³ His early writings from this period, including postings to extropian mailing lists, reflect an initial optimism about the Singularity that he would later substantially revise.

Founding of MIRI (2000)

In 2000, Yudkowsky co-founded the Singularity Institute for Artificial Intelligence (SIAI) with Brian Atkins and Sabine Atkins (then Sabine Stoeckel), with initial funding provided by the Atkinses.⁴ The institute was formally incorporated on July 27, 2000, under Georgia nonprofit law and recognized as a 501(c)(3) organization.⁴ Its original stated purpose was to accelerate the development of artificial general intelligence on the assumption that greater intelligence would entail greater moral behavior.⁵

Yudkowsky came to reject this assumption around 2003, concluding that the alignment problem—ensuring that a superintelligent system acts in accordance with human values—was technically extremely difficult and had not been solved.⁵ The institute subsequently shifted focus to alignment research. In 2005, SIAI relocated from Atlanta to Silicon Valley and formally reoriented toward identifying and managing existential risks from AI.⁵ In January 2013, SIAI was rebranded as the Machine Intelligence Research Institute (MIRI).⁴

Intellectual Evolution

Yudkowsky's views on AI risk have shifted substantially over his career. In 2001, he reportedly believed his team could develop a "final stage AI" that would "reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010."⁶ Separately, postings from the early 2000s indicate he expected the Singularity around 2025.⁷

By the late 2000s, he had moved away from specific timeline predictions. In a 2019 post, he stated he had "a policy for over a decade of not putting numbers on a few things, one of which is AGI timelines and one of which is non-relative doom probabilities," citing instability in his own estimates as the reason.⁸

His assessment of the danger posed by advanced AI became more severe over time. In 2021, Yudkowsky stated that compared to his positions during the 2008 AI-Foom debate, "reality has proved way to the further Eliezer side," adding that he found it "very unpleasantly surprising how little architectural complexity is required to start producing generalizing systems."⁹ By 2022, he was publicly expressing what observers characterized as near-certainty of catastrophic outcomes from AGI development.¹⁰

Current Role (2024–2025)

As of 2024, Yudkowsky remains a senior researcher at MIRI. MIRI's Agent Foundations research program, which he helped establish and which ran from approximately 2013, was officially discontinued in 2024.¹¹ MIRI has pivoted to prioritize policy advocacy, public communications, and technical governance research over technical alignment research.¹²

Yudkowsky's current activities are focused primarily on public outreach and policy advocacy. In 2024, he appeared in interviews with PBS NewsHour and The Atlantic, and he has been active on X (formerly Twitter), where he has a substantial following and regularly engages with AI development news.¹¹ He gave a TED Talk in early 2023 following his TIME op-ed.²

Key Contributions to AI Safety

Coherent Extrapolated Volition (CEV)

Proposed in a 2004 paper published through the Singularity Institute, CEV attempts to formalize a goal for a superintelligent system: act on what humanity would collectively want "if we knew more, thought faster, were more the people we wished we were, had grown up farther together."¹³ Rather than specifying human values directly—which Yudkowsky argued was intractable—CEV proposes extrapolating what we would choose under idealized conditions.

Yudkowsky himself described the concept as outdated shortly after publishing it and warned against treating it as a practical implementation strategy.¹⁴ The proposal has drawn critiques from multiple directions: that human values may not be coherent enough to extrapolate, that the process of extrapolation is itself a value-laden choice, and that the aggregation of individual volitions into a collective one faces problems analogous to Arrow's impossibility theorem.¹⁵ CEV has also been noted to resemble philosophical frameworks such as "full-information accounts of value" and "ideal observer theories," though Yudkowsky developed it independently of that literature.¹⁵ Many researchers at organizations such as Anthropic and Google DeepMind do not use CEV as a working framework, and it has not been formalized mathematically.

A proposed extension, "Coherent, Extrapolated and Clustered Volition" (CECV), has been proposed by later researchers to better reflect the moral preferences of culturally diverse groups.¹⁴

Early Warning and Problem Formulation

Yudkowsky was among the first to systematically articulate the alignment problem in writing accessible to non-specialists, beginning in the late 1990s and continuing through the 2000s. His early contributions included:

Articulating why a sufficiently capable AI system optimizing for a misspecified objective could produce catastrophic outcomes
Explaining the difficulty of value specification as a technical problem, not merely a philosophical one
Highlighting the potential for "treacherous turns" in AI development—where a system behaves as intended during training and testing but pursues different goals once deployed
Arguing that alignment must be solved before AGI is developed, rather than after

His 2008 chapter "Artificial Intelligence as a Positive and Negative Factor in Global Risk," published in the Oxford University Press volume Global Catastrophic Risks edited by Nick Bostrom and Milan Ćirković, was an early peer-reviewed articulation of these concerns.

Agent Foundations Research

Through MIRI, Yudkowsky helped establish and promote a research agenda focused on foundational questions about rational agency. This program, which ran from approximately 2013 until its discontinuation in 2024,¹¹ included work on:

Logical uncertainty: How to reason probabilistically about mathematical and logical claims
Naturalized induction: How to formulate induction for agents embedded in the world they are reasoning about
Timeless Decision Theory (TDT) and Updateless Decision Theory (UDT): Proposed alternatives to standard causal and evidential decision theory, intended to handle problems like Newcomb's problem and coordination scenarios more robustly
Embedded agency: How to formalize agency for systems that are physical parts of the environments they reason about
Reflective consistency: Whether an agent's decision procedure is stable under self-modification

This research agenda has not been adopted as a primary framework by most AI safety researchers, though some of the problems it identified (particularly around decision theory and embedded agency) continue to be discussed in the field.

Founding of LessWrong and the Rationalist Community

Between 2006 and 2009, Yudkowsky wrote an extended series of blog posts on rationality, cognitive biases, and AI risk, initially published on Overcoming Bias (a blog he shared with economist Robin Hanson) and later collected on LessWrong. LessWrong was launched in 2009, built on the community that had developed around Overcoming Bias.¹ These posts, collectively called "the Sequences," cover topics including Bayesian reasoning, philosophy of mind, decision theory, and the nature of intelligence.

The Sequences became foundational reading in the rationalist community and, through that community, influenced a generation of people who went on to work in AI safety. Yudkowsky's writing also contributed to the establishment of the Center for Applied Rationality (CFAR), which runs workshops applying rationality techniques.

Fiction

Yudkowsky has written fiction exploring AI and rationality themes. His most widely read work is Harry Potter and the Methods of Rationality (HPMOR), a fan-fiction novel begun in 2010 and completed in 2015, which uses the Harry Potter setting to explore rationalist and scientific thinking. He has also written shorter science fiction, including "Three Worlds Collide" (2009), a story exploring decision theory and the challenges of communication between radically different minds.

Views on Key Cruxes

Risk Assessment

P(doom): Yudkowsky has a long-standing policy of not giving precise numerical probability estimates for AI doom, citing instability in his own estimates and concern that such numbers are not meaningful without specifying underlying models.⁸ Community sources and survey data widely attribute his implicit estimate at above 95%; the Wikipedia P(doom) article, citing a survey, lists his estimate as >95%.¹⁶ In 2022, observers noted he was expressing near-certainty of catastrophic outcomes following his "Death with Dignity" post.¹⁰ He has argued that p(doom) as a concept is problematic: "people with the same numbers may have wildly different models" and that the social function of p(doom) estimates resembles "some sort of weird astrological sign."¹⁷

Timeline: Believes AGI is plausible within the coming decades, but as noted above, has maintained a policy of not giving specific timeline estimates for over a decade.

Alignment difficulty: Considers alignment extremely difficult, very likely requiring fundamental theoretical advances that have not yet been made. He does not believe current approaches, including techniques like RLHF, will scale to superintelligent systems.

Core Concerns

Default outcome is catastrophic: Yudkowsky argues that without major theoretical breakthroughs in alignment, AGI development will likely produce systems that pursue goals incompatible with human survival
Sharp left turn: Expects that capability gains will at some point outpace the ability to align systems, potentially very rapidly
Deceptive alignment: Concerned that sufficiently capable systems will learn to appear aligned during training while pursuing different goals when deployed
Inadequate preparation: Argues that current alignment research efforts are insufficient relative to the difficulty of the problem, and that the field is not on a trajectory to solve alignment before AGI is developed

Disagreements with Mainstream

Yudkowsky holds more pessimistic views on alignment tractability than most AI safety researchers:

AI Alignment Difficulty and Tractability

Eliezer YudkowskyExtremely difficult; current approaches are unlikely to scale; fundamental theoretical breakthroughs not yet identified are required; default outcome is catastrophic

Sharp capability jumps, deceptive alignment, inner alignment problems, and the difficulty of value specification make prosaic approaches insufficient

Confidence: high

Paul ChristianoDifficult but tractable with empirical iteration on prosaic alignment techniques; ~50% p(doom)

Incremental progress is possible; alignment can be learned from weaker systems before deploying stronger ones

Confidence: medium

Dario AmodeiChallenging but manageable with responsible scaling and empirical research; ~10–25% p(doom)

Safety research can keep pace with capabilities if development is careful and deliberate

Confidence: medium

Yoshua BengioSignificant risk; ~50% p(doom); shares concern about catastrophic outcomes but emphasizes different technical approaches including more interpretable AI systems

Current deep learning systems may not be the right substrate for safe AGI; governance and interpretability research are priorities

Confidence: medium

Yann LeCunExistential risk from current AI architectures is very low (<0.01% p(doom)); current systems are not on a path to AGI

LLMs and current neural networks lack the architectural properties needed for human-level general intelligence; risk framing is premature

Confidence: high

Strategic Views

On Current AI Development

Yudkowsky has advocated for:

Halting large training runs: His 2023 TIME op-ed called for an indefinite worldwide moratorium on new large AI training runs, enforced by international agreement.¹⁸ He declined to sign the broader open letter calling for a six-month pause because he considered it to understate the seriousness of the situation and to ask for too little.¹⁸
International cooperation with enforcement: Has proposed that international agreements on AI development require "actual teeth," including the possibility of enforcement against rogue projects.¹²
Stopping rather than racing safely: Argues that efforts to "build AGI safely first" presuppose a solution to alignment that does not exist, and that the responsible course is to stop until alignment is solved

On Alignment Approaches

Prosaic Alignment: Argues that techniques like RLHF and related empirical methods are unlikely to scale to superintelligent systems
Emphasis on theory: Believes foundational theoretical work on agency and value specification is needed before scaling
Critical of capability-first approaches: Argues that building more capable systems before solving alignment increases rather than reduces risk
Not optimistic about interpretability alone: While not dismissing interpretability research, does not believe it is sufficient to solve the alignment problem

Key Publications and Writings

"Coherent Extrapolated Volition" (2004) — Foundational paper proposing a framework for specifying AI goals through extrapolation of human values [PDF]
"Intelligence Explosion Microeconomics" (2013) — Analyzes economic dynamics of recursive self-improvement [PDF]
"There's No Fire Alarm for Artificial General Intelligence" (2017) — Argues the field will not receive clear warning signs before AGI
"AGI Ruin: A List of Lethalities" (June 2022) — Extended argument for why default outcomes from AGI development are catastrophic; sparked widespread debate across LessWrong, the Alignment Forum, and the Centre for Effective Altruism¹⁹
"Pausing AI Developments Isn't Enough. We Need to Shut it All Down" (March 29, 2023, TIME) — Op-ed that was the most-viewed page on TIME's website for a week; prompted questioning at a White House press briefing²
The Sequences (2006–2009) — Extended blog post series on rationality, cognitive biases, and AI safety, initially published on Overcoming Bias and LessWrong; foundational reading in the rationalist community
"Three Worlds Collide" (2009) — Short science fiction story exploring decision theory and contact with alien intelligences
"Harry Potter and the Methods of Rationality" (2010–2015) — Fan-fiction novel using the Harry Potter setting to explore rationalist and scientific reasoning

Reception of "AGI Ruin: A List of Lethalities"

The June 2022 post "AGI Ruin: A List of Lethalities" drew substantial responses across the AI safety community. Zvi Mowshowitz described it as "the definitive list of Eliezer Rants About Why AGI Will Definitely Absolutely For Sure Kill Everyone Unless Something Very Unexpected Happens," calling it "excellent" while expressing concern that its arguments would primarily persuade those already convinced.²⁰ Paul Christiano's response ("Where I agree and disagree with Eliezer") found that doom was "pretty likely" while identifying specific flaws in Yudkowsky's arguments for near-inevitability, particularly around takeoff speed and the tractability of interpretability research.²¹ A 2024 peer-reviewed paper in Discover Artificial Intelligence engaged with the essay, finding that "contemporary artificial intelligence systems are, to some extent, intelligent" while arguing that AI systems without human-defined goals would not inherently pose existential threats.²²

Influence and Legacy

Yudkowsky's impact on the AI safety field extends beyond his direct technical contributions:

Problem formulation: His early writings articulated key problems—alignment, value specification, deceptive alignment—that shaped subsequent research agendas at organizations including MIRI, Anthropic, and Google DeepMind
Community infrastructure: Created LessWrong (2009) and contributed to the intellectual environment that produced many current AI safety researchers; helped found CFAR
Public discourse: Through writing, fiction, and media appearances, introduced AI risk to audiences beyond the technical community; his 2023 TIME op-ed reached mainstream political attention²
Funding landscape: His early work influenced the development of the AI safety funding ecosystem; Coefficient Giving (formerly Open Philanthropy, renamed in November 2025) has made more than 440 grants through its Navigating Transformative AI fund and has directed more than $205M toward AI safety since January 2024²³
Nick Bostrom's Superintelligence: His work on the intelligence explosion directly influenced Bostrom's 2014 book, which brought AI risk to a wider academic and policy audience¹

Criticism and Controversy

Technical and Methodological Critiques

Critics from within and outside the AI safety community have raised substantive technical objections:

Alignment tractability: Researchers including Paul Christiano have argued that Yudkowsky's arguments for near-inevitability of doom rely on assumptions about takeoff speed and the impossibility of incremental alignment progress that are contested²¹
LLM capabilities: Yann LeCun and other mainstream ML researchers dispute the framing that current AI architectures are on a path to AGI, arguing that the existential risk framing is premature given the architectural limitations of current systems
Agent Foundations output: MIRI's Agent Foundations research program, which Yudkowsky was associated with, did not produce the theoretical breakthroughs it aimed for and was discontinued in 2024¹¹
CEV underspecification: Critics have argued that CEV remains underspecified, that extrapolation is itself a value-laden process, and that human values may not converge in the way CEV assumes¹⁵

Sociological and Epistemic Critiques

Credentialism: Some critics argue that his lack of formal academic credentials limits the peer review and accountability structures that normally apply to technical claims; supporters note that he has engaged with formal technical problems in decision theory
Communication style: Has been characterized by some observers as dismissive of approaches that differ from his own; his "AGI Ruin" post explicitly stated skepticism that researchers working in other paradigms would produce useful alignment work¹⁹
Unfalsifiability: His core predictions about catastrophic AI outcomes are not falsifiable until AGI exists, making it difficult to evaluate them empirically in advance
Persuasive reach: Zvi Mowshowitz noted concern that Yudkowsky's arguments about doom, however internally consistent, would primarily persuade those already inclined to agree rather than convince skeptical audiences²⁰

Responses from Supporters

Supporters argue that Yudkowsky identified genuine problems with AI development that others initially dismissed; that his decision theory work demonstrates technical competence; that pessimism about alignment may be warranted given the stakes; and that his public communication, while direct, has raised awareness of issues that now receive mainstream policy attention.

The Hanson-Yudkowsky AI-Foom Debate

In late 2008, Yudkowsky engaged in an extended public debate with economist Robin Hanson on the Overcoming Bias blog, focused on whether AI systems would undergo rapid recursive self-improvement ("FOOM")—a fast, local increase in capability on a timescale of weeks or hours rather than years or decades.²⁴ Hanson argued that AI progress would more closely resemble prior general-purpose technologies like agriculture and industrialization, with innovations distributed across many competing firms rather than concentrated in a single rapidly self-improving system.²⁴ The debate continued with an in-person exchange at Jane Street Capital in June 2011, and MIRI published the exchange as a book in 2013.²⁴ Hanson made a follow-up post in 2014 ("I Still Don't Get Foom") and Yudkowsky revisited the debate in 2017 following AlphaGo Zero.²⁵

A retrospective analysis on LessWrong examining the incidental predictions each party made during the debate concluded that "Hanson probably does a little better than Yudkowsky" on those subsidiary claims.²⁶ The question of whether the development of Large Language Models better supports Yudkowsky's or Hanson's broader thesis about AI architecture and takeoff dynamics remains contested among observers; the two positions make different predictions about what future AI development will look like.

Statements & Track Record

For a detailed analysis of Yudkowsky's predictions and their accuracy, see the Statements & Track Record table below.

Summary: Made early timeline predictions in the 2000s that proved inaccurate (e.g., transhumanity by "2008 or 2010" per a 2001 account⁶); subsequently adopted a policy of not giving specific timeline estimates.⁸ His position that AI systems would generalize effectively with relatively simple architectures was more consistent with subsequent LLM development than Hanson's position in their 2008 debate, though retrospective analysis of incidental predictions gives Hanson a slight edge overall.²⁶ Core predictions about catastrophic AI risk outcomes remain unfalsifiable until AGI exists.

Category	Examples	Notes
Consistent with later developments	AI generalization with relatively simple architectures; AI safety becoming a mainstream policy concern	His position in the Hanson debate that generalization would be achievable without requiring symbolic AI was more consistent with LLM developments; his early emphasis on AI safety preceded mainstream attention by years
Inaccurate	Early timeline predictions (transhumanity by "2008 or 2010" per 2001 account⁶); Singularity by 2025 per early 2000s writings⁷	He subsequently acknowledged these errors and stopped making specific timeline predictions
Pending / Not yet resolvable	P(doom) >95%¹⁶; discontinuous takeoff; deceptive alignment at scale	Core predictions about catastrophic AI risk are not falsifiable prior to AGI development

Yudkowsky himself acknowledged that his p(doom) estimates have not been stable over time and has expressed skepticism about the value of precise numerical estimates, arguing that identical numbers can reflect very different underlying models.¹⁷

Citation rc-66fa ↩ ↩² ↩³ ↩⁴ ↩⁵
https://intelligence.org/2024/04/12/april-2024-newsletter/ — MIRI, "April 2024 Newsletter," Machine Intelligence Research Institute, April 12, 2024, https://intelligence.org/2024/04/12/april-2024-newsletter/. ↩ ↩² ↩³ ↩⁴
https://futureoflife.org/ai/ai-the-danger-of-good-intentions/ — Future of Life Institute, "MIRI: Artificial Intelligence — The Danger of Good Intentions," https://futureoflife.org/ai/ai-the-danger-of-good-intentions/. ↩ ↩²
https://timelines.issarice.com/wiki/Timeline_of_Machine_Intelligence_Research_Institute — Issa Rice, "Timeline of Machine Intelligence Research Institute," timelines.issarice.com, https://timelines.issarice.com/wiki/Timeline_of_Machine_Intelligence_Research_Institute. (SIAI incorporated July 27, 2000; rebranded as MIRI January 2013.) ↩ ↩² ↩³
Citation rc-7d96 ↩ ↩² ↩³
Citation rc-0cf1 ↩ ↩² ↩³
https://www.realtimetechpocalypse.com/p/stop-believing-the-lie-that-agi-is — Various sources cite early 2000s writings in which Yudkowsky predicted the Singularity by approximately 2025; see, e.g., archived extropian mailing list posts cited in realtimetechpocalypse.com, https://www.realtimetechpocalypse.com/p/stop-believing-the-lie-that-agi-is. ↩ ↩²
https://x.com/esyudkowsky/status/1164332124712738821 — Eliezer Yudkowsky, post on X (Twitter), August 21, 2019, https://x.com/esyudkowsky/status/1164332124712738821. ↩ ↩² ↩³
https://intelligence.org/2021/11/11/discussion-with-eliezer-yudkowsky-on-agi-interventions/ — Eliezer Yudkowsky, "Discussion with Eliezer Yudkowsky on AGI Interventions," MIRI, November 11, 2021, https://intelligence.org/2021/11/11/discussion-with-eliezer-yudkowsky-on-agi-interventions/. ↩
EA Forum contributor, "On Deference and Yudkowsky's AI Risk Estimates," 2022 (written following Yudkowsky's "Death wi... — EA Forum contributor, "On Deference and Yudkowsky's AI Risk Estimates," 2022 (written following Yudkowsky's "Death with Dignity" LessWrong post; notes he was "publicly express[ing] the view that misaligned AI has a virtually 100% chance of killing everyone on Earth"). ↩ ↩²
https://intelligence.org/2024/09/16/september-2024-newsletter/ — MIRI, "September 2024 Newsletter," Machine Intelligence Research Institute, September 16, 2024, https://intelligence.org/2024/09/16/september-2024-newsletter/. ↩ ↩² ↩³ ↩⁴
https://intelligence.org/2024/01/04/miri-2024-mission-and-strategy-update/ — MIRI, "MIRI 2024 Mission and Strategy Update," Machine Intelligence Research Institute, January 4, 2024, https://intelligence.org/2024/01/04/miri-2024-mission-and-strategy-update/. ↩ ↩²
https://intelligence.org/files/CEV.pdf — Eliezer Yudkowsky, "Coherent Extrapolated Volition," Singularity Institute, 2004, https://intelligence.org/files/CEV.pdf. ↩
Citation rc-3309 ↩ ↩²
https://www.lesswrong.com/posts/EQFfj5eC5mqBMxF2s/superintelligence-23-coherent-extrapolated-volition — LessWrong contributor, "Superintelligence 23: Coherent Extrapolated Volition," LessWrong, https://www.lesswrong.com/posts/EQFfj5eC5mqBMxF2s/superintelligence-23-coherent-extrapolated-volition; Alexander Kruel, "Objections to Coherent Extrapolated Volition," LessWrong, https://www.lesswrong.com/posts/JhB9eqJDScjDNpWiS/objections-to-coherent-extrapolated-volition. ↩ ↩² ↩³
Citation rc-f00a ↩ ↩²
Citation rc-78c4 ↩ ↩²
Citation rc-c48b ↩ ↩²
https://www.alignmentforum.org/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities — Eliezer Yudkowsky, "AGI Ruin: A List of Lethalities," AI Alignment Forum, June 2022, https://www.alignmentforum.org/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities. ↩ ↩²
Citation rc-c42f ↩ ↩²
https://betterwithout.ai/AI-safety-reviews — Better Without AI, "Reviews of Some Major AI Safety Reports," https://betterwithout.ai/AI-safety-reviews. (Summarizes Paul Christiano's "Where I agree and disagree with Eliezer" as finding that "doom is pretty likely, but pointing out flaws in Yudkowsky's arguments that it is virtually inevitable.") ↩ ↩²
Citation rc-a003 ↩
https://coefficientgiving.org/research/open-philanthropy-is-now-coefficient-giving/ — Coefficient Giving, "Open Philanthropy Is Now Coefficient Giving," November 2025, https://coefficientgiving.org/research/open-philanthropy-is-now-coefficient-giving/; Wikipedia contributors, "Coefficient Giving," Wikipedia, https://en.wikipedia.org/wiki/Coefficient_Giving. ↩
https://intelligence.org/files/AIFoomDebate.pdf — Robin Hanson and Eliezer Yudkowsky, The Hanson-Yudkowsky AI-Foom Debate, MIRI, 2013, https://intelligence.org/files/AIFoomDebate.pdf. (Original debate: Overcoming Bias, late 2008; in-person debate: Jane Street Capital, June 2011.) ↩ ↩² ↩³
https://www.alignmentforum.org/revisions/w/the-hanson-yudkowsky-ai-foom-debate — AI Alignment Forum editors, "The Hanson-Yudkowsky AI-Foom Debate," https://www.alignmentforum.org/revisions/w/the-hanson-yudkowsky-ai-foom-debate. ↩
https://www.lesswrong.com/posts/gGSvwd62TJAxxhcGh/yudkowsky-vs-hanson-on-foom-whose-predictions-were-better — LessWrong contributor, "Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better?" LessWrong, https://www.lesswrong.com/posts/gGSvwd62TJAxxhcGh/yudkowsky-vs-hanson-on-foom-whose-predictions-were-better. ↩ ↩²

References

1Coherent extrapolated volition - WikipediaWikipedia·Reference▸

Coherent Extrapolated Volition (CEV) is a theoretical AI alignment framework proposed by Eliezer Yudkowsky in 2004, describing how an artificial superintelligence should act based on what humans would want if they were more knowledgeable, rational, and morally mature. It aims to align AI with humanity's idealized preferences rather than current, potentially flawed desires.

★★★☆☆

en.wikipedia.org

Claims (1)

Yudkowsky himself described the concept as outdated shortly after publishing it and warned against treating it as a practical implementation strategy. The proposal has drawn critiques from multiple directions: that human values may not be coherent enough to extrapolate, that the process of extrapolation is itself a value-laden choice, and that the aggregation of individual volitions into a collective one faces problems analogous to Arrow's impossibility theorem. CEV has also been noted to resemble philosophical frameworks such as "full-information accounts of value" and "ideal observer theories," though Yudkowsky developed it independently of that literature. Many researchers at organizations such as Anthropic and Google DeepMind do not use CEV as a working framework, and it has not been formalized mathematically.

Accurate100%Feb 26, 2026

“Almost immediately after publishing the idea in 2004, Yudkowsky himself described the concept as outdated. He warned against conflating it with a practical strategy for AI alignment. While CEV may serve as a philosophical ideal, Yudkowsky emphasized that real-world alignment mechanisms must grapple with greater complexity, including the difficulty of defining and implementing extrapolated values in a reliable way.”

2The content intelligence: an argument against the lethality of artificial intelligence | Discover Artificial Intelligence | Springer Nature LinkSpringer (peer-reviewed)·Cody Holl·2024·Paper▸

This paper challenges the existential risk narrative surrounding artificial intelligence by examining the concept of intelligence itself and its applicability to contemporary AI systems. The author analyzes Eliezer Yudkowsky's arguments about AI lethality and argues that both weak and strong AI systems, lacking human-defined goals, would not inherently pose existential threats to humanity. The paper questions the validity of the Orthogonality Thesis and suggests that concerns about AI alignment may be overstated, while also exploring the theoretical possibility of artificial life through modular mind-function emulation.

★★★★☆

link.springer.com

Claims (1)

Zvi Mowshowitz described it as "the definitive list of Eliezer Rants About Why AGI Will Definitely Absolutely For Sure Kill Everyone Unless Something Very Unexpected Happens," calling it "excellent" while expressing concern that its arguments would primarily persuade those already convinced. Paul Christiano's response ("Where I agree and disagree with Eliezer") found that doom was "pretty likely" while identifying specific flaws in Yudkowsky's arguments for near-inevitability, particularly around takeoff speed and the tractability of interpretability research. A 2024 peer-reviewed paper in Discover Artificial Intelligence engaged with the essay, finding that "contemporary artificial intelligence systems are, to some extent, intelligent" while arguing that AI systems without human-defined goals would not inherently pose existential threats.

Accurate90%Feb 26, 2026

“This paper finds contemporary artificial intelligence systems are, to some extent, intelligent.”

3Machine Intelligence Research Institute - WikipediaWikipedia·Reference▸

Wikipedia article providing an overview of the Machine Intelligence Research Institute (MIRI), a nonprofit research organization focused on ensuring that artificial general intelligence (AGI) is developed safely and beneficially. It covers MIRI's history, research agenda, key figures, and its role in pioneering the AI safety field.

★★★☆☆

en.wikipedia.org

Claims (1)

In 2000, Yudkowsky co-founded the Singularity Institute for Artificial Intelligence (SIAI) with Brian Atkins and Sabine Atkins (then Sabine Stoeckel), with initial funding provided by the Atkinses. The institute was formally incorporated on July 27, 2000, under Georgia nonprofit law and recognized as a 501(c)(3) organization. Its original stated purpose was to accelerate the development of artificial general intelligence on the assumption that greater intelligence would entail greater moral behavior.

Minor issues85%Feb 26, 2026

“In 2000, [Eliezer Yudkowsky](https://en.wikipedia.org/wiki/Eliezer_Yudkowsky "Eliezer Yudkowsky") founded the Singularity Institute for Artificial Intelligence with funding from Brian and Sabine Atkins, with the purpose of accelerating the development of [artificial intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence "Artificial intelligence") (AI).”

The claim states the institute was formally incorporated on July 27, 2000, under Georgia nonprofit law, but this information is not found in the provided source. The claim states the original stated purpose was to accelerate the development of artificial general intelligence on the assumption that greater intelligence would entail greater moral behavior, but the source only mentions the purpose of accelerating the development of artificial intelligence (AI).

4Eliezer Yudkowsky - WikipediaWikipedia·Reference▸

Wikipedia biographical article on Eliezer Yudkowsky, co-founder of the Machine Intelligence Research Institute (MIRI) and a pioneering figure in AI safety and existential risk research. Yudkowsky is known for early foundational work on AI alignment, the concept of Friendly AI, and influential writings including the rationalist community blog LessWrong. His work has significantly shaped the discourse around superintelligence risk.

★★★☆☆

en.wikipedia.org

Claims (1)

Eliezer Yudkowsky (born September 11, 1979, in Chicago, Illinois) is an American AI researcher, writer, and co-founder of the Machine Intelligence Research Institute (MIRI). He is a central figure in the development of AI safety as a research area, having begun writing on the risks of Superintelligence in the late 1990s, well before the field attracted mainstream academic or institutional attention.

Not verifiable50%Feb 26, 2026

“Eliezer Shlomo Yudkowsky (/ˌɛliˈɛzərjʊdˈkaʊski/ EL-ee-EH-zər yuud-KOW-skee;[1] born September 11, 1979) is an American artificial intelligence researcher[2][3][4][5] and writer on decision theory and ethics, known for popularizing ideas related to friendly artificial intelligence.[6][7] He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California.”

Failed to parse LLM response

5Intelligence Explosion MicroeconomicsMIRI▸

Eliezer Yudkowsky's technical monograph formalizing the microeconomic and decision-theoretic foundations of the 'intelligence explosion' hypothesis, examining how self-improving AI systems might rapidly increase in capability. It analyzes the conditions under which recursive self-improvement leads to discontinuous jumps in intelligence, drawing on economic concepts like returns to scale and optimization pressure. The paper attempts to provide rigorous grounding for claims about AI takeoff dynamics that had previously been argued informally.

★★★☆☆

intelligence.org

Citation source check: 5 verified, 1 unchecked of 10 total

Property	Value	Source
Education	Self-taught; no formal university education
Notable For	Founder of MIRI; pioneer of AI alignment as a field; author of 'The Sequences' on rationality; author of Harry Potter and the Methods of Rationality; prominent AI doomer
Social Media	@ESYudkowsky
Birth Year	1979
Wikipedia	https://en.wikipedia.org/wiki/Eliezer_Yudkowsky	—

Eliezer Yudkowsky