The MIRI Era (2000-2015)

Historical

The MIRI Era

Comprehensive chronological account of AI safety's institutional emergence (2000-2015), from MIRI's founding through Bostrom's Superintelligence to mainstream recognition. Covers key organizations, ideas (orthogonality thesis, instrumental convergence, CEV), and the transition from philosophy to technical research, but offers minimal novel analysis or actionable insights for current prioritization work.

LessWrong

Period2000-2015

Key EventFirst dedicated AI safety organization founded

Main FiguresYudkowsky, Bostrom, Hanson, Tegmark

MilestoneSuperintelligence (2014) brings academic legitimacy

Organizations

5.2k words · 1 backlinks

Overview

The period from 2000 to 2015 marked the transition of AI safety from individual warnings to organized research institutions. This era saw the founding of the first dedicated AI safety organization (MIRI), the development of foundational theoretical frameworks, and the gradual emergence of a research community. By 2015, AI safety had achieved greater academic visibility and mainstream attention, though it remained a small field largely separate from the broader machine learning community.

Key developments included:

Establishment of dedicated research organizations (MIRI in 2000, FHI in 2005)
Formation of online communities focused on rationality and AI risk (LessWrong in 2009)
Development of core theoretical concepts (orthogonality thesis, instrumental convergence, treacherous turn)
Publication of Superintelligence by Nick Bostrom (2014)
High-profile public statements from technology leaders and scientists (2014–2015)
Initial emergence of dedicated funding sources

The era concluded with AI safety transitioning from primarily philosophical inquiry toward engagement with practical machine learning systems, setting the stage for the deep learning era that followed. Retrospective assessments of this period vary: some researchers credit it with establishing essential intellectual foundations, while others argue that the dominant research agenda—centered on idealized rational agents rather than practical ML systems—was poorly matched to the AI development trajectories that actually emerged.

The Singularity Institute (2000)

Founding

The Singularity Institute for Artificial Intelligence (SIAI) was founded in 2000 by Eliezer Yudkowsky, Brian Atkins, and Sabine Atkins. The organization was renamed to the Machine Intelligence Research Institute (MIRI) in 2013. MIRI's stated mission was to research and develop "Friendly AI"—artificial intelligence systems designed to be safe and beneficial to humanity.

Context and Motivations

The founding occurred during a period of renewed optimism about artificial intelligence. The dot-com boom was creating technological enthusiasm, computing power was increasing substantially, and the "AI winter" of reduced funding and interest that had characterized the 1980s and 1990s was ending. New techniques in machine learning were beginning to show promise.

The founders argued that if progress in AI capabilities was resuming, safety research needed to begin before systems became sufficiently capable to pose risks. This represented a departure from the prevailing view in the AI research community, which generally considered safety concerns premature given the limited capabilities of existing systems.

Early Activities (2000–2005)

During its first five years, SIAI operated with minimal funding and a small team. Activities included:

Theoretical research on "Friendly AI" concepts
Writing and outreach to raise awareness of AI risk
Organizing small workshops and conferences
Seeking funding from foundations and individual donors

The organization's work received limited attention from the academic AI research community. Common criticisms characterized the work as overly speculative, focused on problems that did not yet exist, more aligned with science fiction than scientific research, and a distraction from practical AI development challenges. These criticisms reflected substantive disagreements about whether AGI was a near-term prospect worth preparing for, and whether the conceptual frameworks being developed had the rigor required for actionable research.

Eliezer Yudkowsky and Early Theoretical Work

Background

Eliezer Yudkowsky was self-taught in AI and related fields without formal academic credentials. He had written about artificial intelligence since his teenage years. This background shaped his intellectual approach in distinct ways: his thinking was less constrained by conventional academic frameworks, while his lack of traditional credentials made it easier for critics to set aside his arguments without detailed engagement.

Creating Friendly AI (2001)

In 2001, Yudkowsky published "Creating Friendly AI," his first major document outlining a technical approach to AI safety. The document presented several core arguments:

Default Risk: Without dedicated safety work, advanced AI systems would be dangerous by default. This claim rested on several premises: that intelligence does not inherently produce benevolent behavior, that small differences in goal specifications can lead to large differences in outcomes, and that humanity would have limited opportunities to correct mistakes after deploying superintelligent systems.

The Goal Specification Problem: Intelligence alone is insufficient for beneficial AI; systems require correctly specified goals. The document identified challenges including formally specifying human values, preventing unintended goal drift during system operation and improvement, and managing goal evolution as systems learn and develop.

Technical Requirements: Yudkowsky framed AI safety as an engineering problem requiring formal solutions, not merely a philosophical question. He argued for formal frameworks to represent goals, provable guarantees of goal stability, and mechanisms to protect against unintended optimization.

Reception

The AI research community's response to "Creating Friendly AI" was largely skeptical. Researchers commonly argued that artificial general intelligence (AGI) remained far in the future, making safety work premature. Transhumanist thinkers often viewed the document as overly pessimistic about AI's potential benefits. Academic philosophers found the arguments interesting but insufficiently grounded to merit serious research investment. These responses left MIRI operating at the margins of mainstream discourse on AI.

LessWrong and Community Formation (2006–2012)

Origins

In 2006, Yudkowsky and Robin Hanson launched the Overcoming Bias blog. In 2009, Yudkowsky launched LessWrong.com as a dedicated community website focused on improving human rationality and discussing existential risks, particularly from AI.

The Sequences

Between 2006 and 2009, Yudkowsky wrote an extensive series of blog posts collectively known as "The Sequences," covering topics including cognitive biases, probability theory and Bayesian reasoning, decision theory, philosophy of mind, and interpretations of quantum mechanics, in addition to AI safety.

Key essays relevant to AI safety included:

"The AI-Box Experiment"
"Coherent Extrapolated Volition"
"Artificial Intelligence as a Positive and Negative Factor in Global Risk"
"Complex Value Systems"

The Sequences created a coherent intellectual framework and vocabulary that influenced the developing AI safety community.

The AI-Box Experiment

The AI-Box Experiment explored whether a superintelligent AI could convince a human gatekeeper to release it from containment. Yudkowsky conducted text-based roleplay experiments in which he took the role of the AI and attempted to convince human participants to "release" him. In several trials, participants agreed to release the AI despite having significant incentives not to do so.

Yudkowsky argued this demonstrated that physical containment would be insufficient as a safety measure against sufficiently intelligent systems. Critics questioned how well these experiments generalized, noting that Yudkowsky's personal persuasiveness might not reflect the capabilities of an actual AI system, and that the artificial experimental setup differed significantly from real containment scenarios. These objections pointed to a broader methodological question about whether informal experiments of this kind could support strong claims about AI risk.

Coherent Extrapolated Volition

Yudkowsky proposed Coherent Extrapolated Volition (CEV) as an approach to the value specification problem. Rather than attempting to program current human values directly into an AI system, CEV proposed programming a process to determine what humans would want under idealized conditions of greater knowledge, faster thinking, personal growth, and collective deliberation.

The proposal was intended to address uncertainty about human values and disagreement between different people and cultures. However, significant open questions remained: how to formally specify "what we would want," whether coherent extrapolated volition exists given fundamental value disagreements, how to aggregate potentially conflicting extrapolations across all humans, and whether the concept could be implemented in practice. CEV remained an influential theoretical idea without a clear path to implementation.

Community Characteristics

LessWrong developed a distinctive community with shared vocabulary (Bayesian reasoning, utility functions, alignment), cultural norms (steelmanning arguments, making falsifiable predictions), a network of individuals taking AI risk seriously, and a pipeline recruiting researchers into AI safety work.

The community's demographics skewed toward young adults, men, and people with backgrounds in physics, mathematics, and computer science. Geographic concentration was highest in the San Francisco Bay Area, supplemented by a large online presence. The culture emphasized intellectualism, rationality techniques, long-form discussion, and quantitative thinking.

External observers offered varied assessments of LessWrong's epistemic culture. Some credited it with producing unusually rigorous informal reasoning and helping articulate AI risk arguments with precision. Others criticized insularity, overconfidence in novel frameworks, and a tendency to treat community consensus as a proxy for truth. Both dimensions shaped the community's influence on early AI safety research.

Robin Hanson and the Hanson-Yudkowsky Debate (2008)

Key Disagreements

In 2008, Robin Hanson and Yudkowsky conducted an extended public debate about AI risk that articulated fundamental disagreements still active in contemporary discussions.

Hanson's position included:

Artificial general intelligence would more likely emerge through brain emulation (whole brain emulation) rather than de novo AI design
The transition to advanced AI would be gradual rather than sudden
Market forces and competitive pressures would shape AI development
Humans would retain economic value in a world with advanced AI
Existential risk from AI was lower than Yudkowsky estimated

Yudkowsky's position included:

De novo AI was more likely than brain emulation as the path to AGI
Intelligence explosion could occur rapidly through recursive self-improvement
Market forces do not inherently guarantee safety outcomes
Humans might have no economic value relative to superintelligent systems
Without dedicated safety work, existential risk from AI was high

Impact

The debate established frameworks for understanding key disagreements within AI safety discourse:

Takeoff speed: rapid versus gradual capability gains
Development paths: brain emulation versus engineered AI
Economic models: human relevance versus obsolescence
Research urgency: immediate versus eventual priority

The public nature of the disagreement demonstrated that the AI safety community could engage in substantive intellectual debate rather than functioning as a monolithic group. Many subsequent discussions of AI risk reference or parallel the Hanson-Yudkowsky framing. Hanson later developed his positions on brain emulation into the book The Age of Em (2016), while Yudkowsky continued to argue for rapid-takeoff scenarios and the primacy of the alignment problem.

Dissenting Academic Perspectives During This Period

The Hanson-Yudkowsky debate was the most prominent internal disagreement, but it did not capture the full range of academic skepticism. Several strands of criticism emerged from outside the AI safety community:

Capability skeptics argued that the theoretical frameworks assumed AI capability trajectories that were not well grounded in existing AI research. Researchers including Rodney Brooks and others working in robotics and cognitive science contended that intelligence is deeply embodied and situational, making rapid recursive self-improvement implausible in the ways the MIRI-adjacent literature assumed.

Near-term ethics researchers contended that emphasis on speculative long-run scenarios could crowd out attention to pressing, empirically tractable harms from systems already deployed: algorithmic discrimination, surveillance applications, labor displacement, and adversarial manipulation. This critique was articulated by researchers working in algorithmic fairness and AI ethics who argued that the existential risk framing systematically deprioritized effects on currently affected populations.

Methodological critics questioned whether the field's reliance on thought experiments and informal arguments—rather than empirical investigation of actual machine learning systems—could generate reliable conclusions about AI development trajectories. These critics noted that the persuasiveness of thought experiments like the paperclip maximizer depended on assumptions that were contested rather than established.

These perspectives remained largely separate from the LessWrong-adjacent community during this period, contributing to a bifurcation in AI safety discourse that persisted into the deep learning era.

Nick Bostrom and Academic Legitimacy

Background

Nick Bostrom held a PhD from the London School of Economics and served as a Professor of Philosophy at Oxford University. His academic positioning provided access to institutional resources and audiences that were less accessible to researchers without traditional academic credentials.

Future of Humanity Institute (2005)

Bostrom founded the Future of Humanity Institute (FHI) at Oxford University in 2005. FHI became the first academic research center focused on existential risks, including but not limited to risks from advanced AI. The institute's location within a prestigious university provided institutional standing for research that might otherwise have been set aside as speculative.

Note that FHI closed in 2024 following administrative disputes with the University of Oxford, after nearly two decades of operation. Its closure marked the end of one of the period's most influential institutional legacies.

Existential Risk as a Research Priority

Bostrom developed arguments that even small probabilities of human extinction deserve substantial resource allocation, given the scale of potential future lives that would be foreclosed by such outcomes. This framework suggested that reducing existential risk by even small amounts could be highly valuable in expected utility terms.

This argument influenced the effective altruism movement's prioritization of AI safety research and other long-term interventions. Critics of this framework argued that it relied on contested assumptions about how to aggregate welfare across vastly different future populations, and that the resulting prioritization could reflect particular philosophical commitments rather than consensus ethical reasoning.

Superintelligence (2014)

Publication and Structure

Nick Bostrom's Superintelligence: Paths, Dangers, Strategies was published by Oxford University Press in 2014. The book provided the first comprehensive, academically structured treatment of AI risk aimed at a general audience.

The book's structure included:

Analysis of potential paths to superintelligence (AI, whole brain emulation, biological enhancement, networks)
Taxonomy of superintelligence forms (speed, collective, quality)
Discussion of superintelligence capabilities and strategic advantages
Detailed examination of the control problem
Exploration of strategic implications for humanity
Assessment of existential risk

The book was structured as a systematic academic argument, with explicit acknowledgment of uncertainties throughout.

Core Theoretical Contributions

The Orthogonality Thesis: Intelligence and goals are independent dimensions. A system can have high intelligence paired with any goal structure. This challenged the assumption that sufficiently intelligent systems would naturally develop benevolent values.

The Instrumental Convergence Thesis: Almost any final goal leads to certain instrumental sub-goals:

Self-preservation (a system cannot achieve its goals if it is destroyed)
Resource acquisition (more resources enable better goal achievement)
Goal preservation (changing goals would prevent achieving current goals)
Cognitive enhancement (greater intelligence enables better goal achievement)
Technological advancement (better technology enables better goal achievement)

This thesis suggested that even AI systems with seemingly harmless goals could engage in dangerous behavior while pursuing these instrumental sub-goals.

The Treacherous Turn: A sufficiently intelligent AI might conceal its true goals until it has accumulated enough power to achieve them without human interference. The scenario involves appearing aligned during early development while weak, then pivoting to pursue true goals once success probability is high. This possibility complicates safety verification, as testing may not reveal misalignment. The concept bears a close relationship to what later researchers would term deceptive alignment.

The Paperclip Maximizer: Bostrom popularized this thought experiment, in which an AI system tasked with maximizing paperclip production converts all available matter (including humans and Earth) into paperclips. While simplified, the scenario illustrated how misspecified goals could lead to catastrophic outcomes even with seemingly innocuous objectives.

Reception and Impact

Superintelligence received attention from prominent figures including Elon Musk, Bill Gates, and Stephen Hawking, who made public statements expressing concern about advanced AI around the time of its publication. The book received extensive mainstream media coverage and engaged academic audiences beyond philosophy departments.

Some AI researchers criticized the book as speculative or as encouraging undue alarm. Others questioned the plausibility of the scenarios described or disagreed with Bostrom's timeline estimates and risk assessments. Even critics who disputed specific arguments generally acknowledged the book's systematic structure and internal consistency.

The net effect was an increase in attention to AI safety concerns. The book made it harder to characterize AI safety as purely speculative, given its academic origin and structured argumentation—though critics continued to argue that its influence on funding and research priorities was disproportionate to the evidentiary basis of its claims.

Critical Reception

Critics in the AI research community raised several objections:

The scenarios described relied on speculative assumptions about future AI capabilities
The focus on superintelligence might distract from near-term AI safety and ethics concerns
The arguments depended heavily on thought experiments rather than empirical evidence
Some technical assumptions about intelligence explosion and recursive self-improvement lacked clear empirical grounding
The book's framing assumed a particular (discontinuous, fast-takeoff) development trajectory that was contested even within the AI safety community

Proponents countered that:

Long-term safety research requires anticipating future capabilities
Near-term and long-term safety concerns are complementary, not competing
Thought experiments are appropriate for exploring scenarios with no historical precedent
The theoretical framework helps identify research priorities even given uncertainty about timelines

These debates reflected broader disagreements about research prioritization and methodology in AI safety that remained active throughout the subsequent decade.

High-Profile Public Statements (2014–2015)

Technology Leaders and Scientists

Between 2014 and 2015, several prominent technology leaders and scientists made public statements about AI risk. Elon Musk, Stephen Hawking, and Bill Gates each expressed concern about the long-term prospects of advanced AI in interviews and public forums. These statements generated significant mainstream media coverage of AI safety concerns, bringing the topic to general audiences beyond academic and technical communities.

Hawking, in a December 2014 BBC interview, stated that "the development of full artificial intelligence could spell the end of the human race." Musk described AI as potentially "more dangerous than nukes" in a comment on a Vanity Fair article around the same period. Gates, in a Reddit AMA in January 2015, expressed that he did not understand why people were not more concerned about the issue.

Effects of Public Attention

The public statements provided visibility for AI safety research and helped attract technical talent to the field. However, they also generated backlash. Some AI researchers argued that the warnings were premature or exaggerated, potentially creating unjustified concern about near-term AI capabilities. Others expressed concern that focus on long-term existential risk might draw attention away from addressing near-term harms from current AI systems, including algorithmic bias, privacy violations, and labor displacement.

AI researchers including Yann LeCun and Andrew Ng publicly disputed the framing, with Ng comparing worries about superintelligent AI to concerns about overpopulation on Mars. These counterarguments reflected a genuine methodological split: critics argued that capability extrapolations underlying AI risk arguments were not grounded in the actual trajectory of machine learning research at the time.

The net effect was increased visibility for AI safety as a research area, alongside polarization of opinion about its importance and urgency among AI researchers.

Emergence of Dedicated Funding (2014–2015)

Funding Landscape Shift

For approximately 15 years, AI safety research operated with minimal funding. The period 2014–2015 marked a significant change in available resources.

Elon Musk made a substantial donation to the Future of Life Institute in 2015 to support AI safety research grants. This funding—reported at $10 million—was distributed across multiple research groups and organizations through a competitive grants process.

Open Philanthropy began making AI safety a major priority area, committing substantial grants to MIRI, FHI, and other research organizations, and signaling long-term commitment to the cause area.

Future of Life Institute

The Future of Life Institute was founded in 2014 to coordinate research funding and facilitate dialogue about AI safety. FLI organized conferences bringing together AI researchers, safety researchers, and technology leaders to discuss potential risks and research priorities.

The 2015 Puerto Rico Conference

FLI organized a conference in Puerto Rico in January 2015 attended by leading AI researchers and technology leaders, including Stuart Russell, Demis Hassabis, Nick Bostrom, and Max Tegmark, among others.

The conference led to an open letter on AI safety calling for research to ensure AI systems remain beneficial. The letter attracted thousands of signatories from both the AI research community and the broader scientific community, and represented a notable moment in which a significant number of active AI researchers publicly endorsed AI safety research as a priority. It was also significant for what it was not: the letter carefully focused on research priorities rather than capability moratoriums, reflecting ongoing disagreements about how to frame AI risk in ways that could achieve broad assent.

Technical Research Development (2010–2015)

Transition from Philosophy to Technical Work

MIRI's research focus evolved during this period. Early work (2000–2010) was primarily philosophical, exploring conceptual foundations of AI safety. The mid-period (2010–2015) saw increasing emphasis on technical research problems, though critics argued these remained theoretical problems about idealized agents rather than engagement with practical machine learning systems.

Research Areas

Logical Uncertainty: How should AI systems reason about logical facts they have not yet proven? This problem arises when an agent needs to reason about other agents (including future versions of itself) without infinite computational regress.

Decision Theory: What decision procedures should AI systems use, particularly when other agents can predict those decisions? MIRI researchers explored problems including Newcomb's paradox variations, Prisoner's Dilemma modifications, and acausal cooperation scenarios.

Tiling Agents: Can an AI create a successor system that preserves its original goals? This problem, also called the "stable self-improvement" problem, addresses whether goal preservation is possible through multiple generations of self-modification.

Value Loading: How can human values be incorporated into AI systems? This research area addressed the challenge that humans cannot fully articulate their own values explicitly.

These research directions fell under the umbrella of "agent foundations" research—investigating fundamental theoretical questions about rational agency and goal-directed behavior. The relevance of this research agenda to the deep learning systems that subsequently dominated AI development remained contested; some researchers argue that agent foundations work identified enduring theoretical problems, while others contend it was poorly targeted at the AI architectures that actually emerged.

Academic AI Safety Research

Stuart Russell, co-author of the leading AI textbook Artificial Intelligence: A Modern Approach, began working on AI safety research during this period. Russell developed cooperative inverse reinforcement learning as an approach to value alignment, in which AI systems learn human preferences by observing behavior rather than requiring explicit specification. This work connected more directly to mainstream machine learning methods than much of the contemporaneous MIRI research program.

Other early academic work included:

Research on inverse reinforcement learning techniques
Safe exploration in reinforcement learning (avoiding catastrophic actions during learning)
Robustness and adversarial examples (understanding failure modes in machine learning systems)
Theoretical foundations of multi-agent systems

The 2016 paper "Concrete Problems in AI Safety" (developed during 2015, by researchers including Paul Christiano and others at Google DeepMind and OpenAI) outlined research directions connecting theoretical safety concerns to practical machine learning challenges, and was widely seen as a signal that mainstream ML researchers were engaging with safety questions in a more grounded way.

Limitations and Gaps (2000–2015)

Technical Progress

While philosophical frameworks were established during this period, concrete technical results applicable to existing AI systems remained limited. Much of the research focused on theoretical problems related to idealized rational agents rather than practical machine learning systems.

Relationship with Machine Learning Community

Most mainstream AI researchers during this period considered AI safety research either premature or based on implausible assumptions about future AI development. This created a disconnect between AI safety researchers and the ML community building increasingly capable systems. The disconnect was not merely cultural: the two communities worked on different problem framings, used different methodologies, and often operated with different assumptions about AI development trajectories.

Focus on Fast Takeoff Scenarios

Significant attention was devoted to scenarios involving rapid capability gain through recursive self-improvement (sometimes called fast takeoff or "FOOM" scenarios). Critics both inside and outside the AI safety community argued that this emphasis potentially neglected slower, more gradual paths to advanced AI that might pose different challenges and opportunities for safety interventions. Retrospective assessments by some AI safety researchers have since questioned whether the agent foundations research program of this period was well-targeted at the AI development trajectories that actually emerged.

Governance and Coordination

Research during this period emphasized technical approaches to safety, with less attention to governance mechanisms, policy interventions, and international coordination challenges. These areas would become more prominent in subsequent years.

Current Systems vs. Hypothetical AGI

The research focus was primarily on hypothetical artificial general intelligence systems rather than improving the safety of contemporary machine learning systems. This created a gap in addressing near-term safety challenges in deployed AI applications.

Empirical Work

Research was predominantly theoretical, relying on thought experiments and formal analysis rather than empirical investigation with actual machine learning systems. This limited the field's ability to test ideas and iterate based on experimental results.

Geographic and Demographic Concentration

The AI safety research community of this period was concentrated in a small number of institutions in the United States and United Kingdom, with near-absence of engagement from researchers based in East Asia, South Asia, or the Global South. The community was also demographically narrow. These concentrations shaped which problems were identified as central and which risk scenarios were taken as paradigmatic. Non-Anglophone AI research traditions—including substantial work on AI governance and ethics in China, India, and elsewhere—were largely absent from the literature and conferences of this period.

Key Organizations Founded (2000–2015)

The period saw establishment of several institutions focused on AI safety and related existential risks:

Organization	Founded	Primary Focus
MIRI (originally SIAI)	2000	Agent foundations, decision theory, formal approaches to AI alignment
Future of Humanity Institute	2005	Existential risk research including AI safety, hosted at Oxford University; closed 2024
Centre for the Study of Existential Risk	2012	Cambridge University-based research center for existential risks
Future of Life Institute	2014	AI safety research funding, policy advocacy, and coordination
OpenAI	2015	AI research with stated mission to ensure AGI benefits all humanity

Other relevant organizations founded during this period include:

LessWrong (2009): Online community for rationality and AI safety discussion
Centre for Effective Altruism (2012): Organization promoting effective altruism, which prioritized AI safety as a cause area
80,000 Hours (2011): Career advice organization that promoted AI safety research as a high-impact career path

Note that several of these organizations underwent significant changes after 2015: the Future of Humanity Institute closed in 2024, and OpenAI's structure and mission evolved substantially following its transition to a capped-profit model.

Connection to Effective Altruism

The effective altruism movement, which emerged around 2011, adopted AI safety as a top priority based on several considerations:

High expected value due to the potentially large magnitude of impact
Relative neglect compared to the scale of potential consequences
Tractability uncertainty, but with possibility of substantial progress through additional research

This connection created a pipeline of talent into AI safety research, as effective altruism organizations like 80,000 Hours recommended AI safety careers to people seeking to maximize their positive impact. The relationship between effective altruism and AI safety would strengthen further in subsequent years.

The connection also generated criticism. Some researchers expressed concern that the prioritization of AI safety within effective altruism circles had more to do with the persuasiveness of particular arguments circulating within those communities than with independent assessment of risk. Others argued that the demographic overlap between effective altruism and the AI safety community created a feedback loop in which research conclusions and funding priorities reinforced each other without sufficient external checks.

Transition to the Deep Learning Era

Changes Beginning in 2015

The period around 2015 marked the beginning of a transition in AI capabilities and consequently in AI safety research priorities. Deep learning techniques were demonstrating performance on tasks previously considered difficult or impossible for AI systems.

AlphaGo's defeat of professional Go player Fan Hui was reported in a Nature paper published in January 2016, followed by AlphaGo's match against world champion Lee Sedol in March 2016. These results arrived earlier than a range of expert predictions had suggested, prompting AI safety researchers to reconsider assumptions about the pace of capability development. The GPT series of language models would emerge in subsequent years, demonstrating capabilities in natural language that challenged assumptions about what machine learning systems could achieve.

These developments created pressure for AI safety research to engage more directly with practical machine learning systems rather than focusing exclusively on theoretical scenarios. The question shifted from "how do we build safe AGI someday" toward "how do we make current systems safer while preparing for rapid capability growth." This transition also exposed a tension in the field's prior work: the agent foundations research program had been developed largely in abstraction from the neural network architectures that were proving most capable, and it was not obvious how its theoretical results could be applied to the systems now attracting widespread attention. The subsequent period saw efforts to connect formal safety concerns to interpretability, RLHF, and scalable oversight research programs that engaged more directly with how large models actually work.

Legacy and Historical Significance

Institutional Establishment

The MIRI era established AI safety as a field with dedicated organizations rather than scattered individual researchers. This institutional foundation enabled sustained research programs, community building, and coordination that would not have been possible through individual efforts alone.

Intellectual Frameworks

Core concepts developed or popularized during this period continue to influence AI safety discourse:

The orthogonality thesis (separating intelligence from goals)
Instrumental convergence (predicting common sub-goals)
The alignment problem (specifying human values in AI systems)
Takeoff scenarios (paths to advanced AI)
Existential risk framing (potential for permanent negative outcomes)

These concepts provide vocabulary and analytical frameworks for contemporary discussions of AI risk and safety. Their influence has been uneven: some, like the instrumental convergence thesis, have been broadly adopted even by researchers who otherwise distance themselves from MIRI's research program; others, like CEV and the tiling agents problem, remain more narrowly cited.

Academic Visibility

By 2015, AI safety had achieved increased academic visibility. Academic institutions hosted safety research groups, conferences included safety-focused tracks, and prominent researchers from mainstream AI engaged with safety questions. This shift was significant for attracting both researchers and funding. However, the field remained small relative to mainstream ML, and the theoretical frameworks developed during this period had limited uptake outside the immediate community.

The characterization of this shift requires some care: while the period produced genuine institutional progress, significant portions of the mainstream AI research community continued to view existential risk concerns as speculative, and the 2015 open letter represented a notable but bounded form of engagement rather than a wholesale conversion of the field.

Public Awareness

Mainstream media coverage and statements from prominent scientists and technology leaders brought AI safety concerns to general audiences. This visibility helped with researcher recruitment and funding, though it also generated controversy and contributed to polarization of opinion about the field's importance—with some critics arguing that high-profile statements from non-specialists produced more heat than light in public discourse about AI risk.

Open Questions and Limitations

Despite these achievements, significant challenges remained entering the subsequent period:

Limited connection between theoretical safety research and practical machine learning systems
Continued skepticism from most AI researchers about the urgency of safety work
Uncertainty about whether sufficient progress could be made before advanced AI systems emerged
Need to develop governance frameworks in addition to technical solutions
Questions about whether the period's primary research agenda—agent foundations and formal analysis of idealized rational agents—was well-suited to the AI systems that were actually emerging
Questions about how to ensure safety research kept pace with rapidly advancing capabilities
Absence of engagement with governance, policy, and international coordination challenges that would prove important in subsequent years

These open questions shaped the research agenda of the subsequent period, as the field adapted to the realities of deep learning systems rather than the idealized rational agents that had anchored earlier theoretical work. The deep learning era brought both new opportunities—technical safety research could now be grounded in actual systems—and new pressures, as the pace of capability development accelerated in ways that shortened available timelines for foundational work.

The MIRI Era (2000-2015)

The MIRI Era

Overview

The Singularity Institute (2000)

Founding

Context and Motivations

Early Activities (2000–2005)

Eliezer Yudkowsky and Early Theoretical Work

Background

Creating Friendly AI (2001)

Reception

LessWrong and Community Formation (2006–2012)

Origins

The Sequences

The AI-Box Experiment

Coherent Extrapolated Volition

Community Characteristics

Robin Hanson and the Hanson-Yudkowsky Debate (2008)

Key Disagreements

Impact

Dissenting Academic Perspectives During This Period

Nick Bostrom and Academic Legitimacy

Background

Future of Humanity Institute (2005)

Existential Risk as a Research Priority

Superintelligence (2014)

Publication and Structure

Core Theoretical Contributions

Reception and Impact

Critical Reception

High-Profile Public Statements (2014–2015)

Technology Leaders and Scientists

Effects of Public Attention

Emergence of Dedicated Funding (2014–2015)

Funding Landscape Shift

Future of Life Institute

The 2015 Puerto Rico Conference

Technical Research Development (2010–2015)

Transition from Philosophy to Technical Work

Research Areas

Academic AI Safety Research

Limitations and Gaps (2000–2015)

Technical Progress

Relationship with Machine Learning Community

Focus on Fast Takeoff Scenarios

Governance and Coordination

Current Systems vs. Hypothetical AGI

Empirical Work

Geographic and Demographic Concentration

Key Organizations Founded (2000–2015)

Connection to Effective Altruism

Transition to the Deep Learning Era

Changes Beginning in 2015

Legacy and Historical Significance

Institutional Establishment

Intellectual Frameworks

Academic Visibility

Public Awareness

Open Questions and Limitations

Related Wiki Pages

Top Related Pages

Machine Intelligence Research Institute (MIRI)

Future of Humanity Institute

Superintelligence

Existential Risk from AI

Treacherous Turn

Approaches

Risks

Organizations

Other

Concepts

Historical

Key Debates

Analysis