Edited 1 day ago1.6k words8 backlinksUpdated every 3 weeksDue in 3 weeks
37QualityDraft •Quality: 37/100LLM-assigned rating of overall page quality, considering depth, accuracy, and completeness.Structure suggests 7335.5ImportanceReferenceImportance: 35.5/100How central this topic is to AI safety. Higher scores mean greater relevance to understanding or mitigating AI risk.57ResearchModerateResearch Value: 57/100How much value deeper investigation of this topic could yield. Higher scores indicate under-explored topics with high insight potential.
Summary
Conjecture is a 30-40 person London-based AI safety org founded 2022, pursuing Cognitive Emulation (CoEm) - building interpretable AI from ground-up rather than aligning LLMs - with \$30M+ Series A funding. Founded by Connor Leahy (EleutherAI), they face high uncertainty about CoEm competitiveness (3-5 year timeline) and commercial pressure risks.
Content7/13
LLM summaryLLM summaryBasic text summary used in search results, entity link tooltips, info boxes, and related page cards.ScheduleScheduleHow often the page should be refreshed. Drives the overdue tracking system.EntityEntityYAML entity definition with type, description, and related entries.Edit historyEdit historyTracked changes from improve pipeline runs and manual edits.crux edit-log view <id>OverviewOverviewA ## Overview heading section that orients readers. Helps with search and AI summaries.
Tables18/ ~6TablesData tables for structured comparisons and reference material.Diagrams0/ ~1DiagramsVisual content — Mermaid diagrams, charts, or Squiggle estimate models.Add Mermaid diagrams or Squiggle modelsInt. links44/ ~12Int. linksLinks to other wiki pages. More internal links = better graph connectivity.Ext. links0/ ~8Ext. linksLinks to external websites, papers, and resources outside the wiki.Add links to external sourcesFootnotes0/ ~5FootnotesFootnote citations [^N] with source references at the bottom of the page.Add [^N] footnote citationsReferences14/ ~5ReferencesCurated external resources linked via <R> components or cited_by in YAML.Quotes0QuotesSupporting quotes extracted from cited sources to back up page claims.crux citations extract-quotes <id>Accuracy0AccuracyCitations verified against their sources for factual accuracy.crux citations verify <id>RatingsN:2.5 R:4 A:2 C:5.5RatingsSub-quality ratings: Novelty, Rigor, Actionability, Completeness (0-10 scale).Backlinks8BacklinksNumber of other wiki pages that link to this page. Higher backlink count means better integration into the knowledge graph.
Issues1
QualityRated 37 but structure suggests 73 (underrated by 36 points)
Conjecture is a 30-40 person London-based AI safety org founded 2022, pursuing Cognitive Emulation (CoEm) - building interpretable AI from ground-up rather than aligning LLMs - with \$30M+ Series A funding. Founded by Connor Leahy (EleutherAI), they face high uncertainty about CoEm competitiveness (3-5 year timeline) and commercial pressure risks.
Connor LeahyPersonConnor LeahyBiography of Connor Leahy, CEO and co-founder of Conjecture, an AI safety company based in London. Previously co-founded EleutherAI in 2020, which produced GPT-J and GPT-NeoX. Leahy holds a high P(...Quality: 19/100
Safety Agendas
InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100Prosaic AlignmentSafety AgendaProsaic AlignmentAligning AI systems using current deep learning techniques without fundamental new paradigms0
Organizations
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100Redwood ResearchOrganizationRedwood ResearchA nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark ali...Quality: 78/100UK AI Safety InstituteOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100
Conjecture is a 30-40 person London-based AI safety org founded 2022, pursuing Cognitive Emulation (CoEm) - building interpretable AI from ground-up rather than aligning LLMs - with \$30M+ Series A funding. Founded by Connor Leahy (EleutherAI), they face high uncertainty about CoEm competitiveness (3-5 year timeline) and commercial pressure risks.
Connor LeahyPersonConnor LeahyBiography of Connor Leahy, CEO and co-founder of Conjecture, an AI safety company based in London. Previously co-founded EleutherAI in 2020, which produced GPT-J and GPT-NeoX. Leahy holds a high P(...Quality: 19/100
Safety Agendas
InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100Prosaic AlignmentSafety AgendaProsaic AlignmentAligning AI systems using current deep learning techniques without fundamental new paradigms0
Organizations
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100Redwood ResearchOrganizationRedwood ResearchA nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark ali...Quality: 78/100UK AI Safety InstituteOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100
1.6k words · 8 backlinks
Overview
Conjecture is an AI safety research organization founded in 2022 by Connor LeahyPersonConnor LeahyBiography of Connor Leahy, CEO and co-founder of Conjecture, an AI safety company based in London. Previously co-founded EleutherAI in 2020, which produced GPT-J and GPT-NeoX. Leahy holds a high P(...Quality: 19/100 and a team of researchers concerned about existential risks from advanced AI. The organization pursues a distinctive technical approach centered on "Cognitive Emulation" (CoEm) - building interpretable AI systems based on human cognition principles rather than aligning existing large language modelsCapabilityLarge Language ModelsComprehensive analysis of LLM capabilities showing rapid progress from GPT-2 (1.5B parameters, 2019) to GPT-5 and Gemini 2.5 (2025), with training costs growing 2.4x annually and projected to excee...Quality: 60/100.
Based in London with a team of 30-40 researchers, Conjecture raised over $10M in Series A funding in 2023. Their research agenda emphasizes mechanistic interpretability and understanding neural network internals, representing a fundamental alternative to mainstream prosaic alignment approachesArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100 pursued by organizations like AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100 and OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100.
UK AISI↗🏛️ government★★★★☆UK GovernmentUK AISIcapabilitythresholdrisk-assessmentgame-theory+1Source ↗
Risk Assessment
Risk Category
Severity
Likelihood
Timeline
Trend
CoEm Uncompetitive
High
Moderate
3-5 years
Uncertain
Commercial Pressure Compromise
Medium
High
2-3 years
Worsening
Research Insularity
Low
Moderate
Ongoing
Stable
Funding Sustainability
Medium
Low
5+ years
Improving
Founding and Evolution
Origins (2022)
Conjecture emerged from the EleutherAI collective, an open-source AI research group that successfully recreated GPT-3 as open-source models (GPT-J, GPT-NeoX). Key founding factors:
Factor
Impact
Details
EleutherAI Experience
High
Demonstrated capability replication feasibility
Safety Concerns
High
Recognition of risks from capability proliferationRiskAI ProliferationAI proliferation accelerated dramatically as the capability gap narrowed from 18 to 6 months (2022-2024), with open-source models like DeepSeek R1 now matching frontier performance. US export contr...Quality: 60/100
European Gap
Medium
Limited AI safety ecosystem outside Bay Area
Funding Availability
Medium
Growing investor interest in AI safety
Philosophical Evolution: The transition from EleutherAI's "democratize AI" mission to Conjecture's safety-focused approach represents a significant shift in thinking about AI development and publication strategies.
Funding Trajectory
Year
Funding Stage
Amount
Impact
2021
Seed
Undisclosed
Initial team of ≈15 researchers
2023
Series A
$30M+
Scaled to 30-40 researchers
2024
Operating
Ongoing
Sustained research operations
Cognitive Emulation (CoEm) Research Agenda
Research Areas
1 entry
Name
Description
Started
Cognitive Emulation (CoEm)
Approach to building AI systems that emulate human cognitive processes rather than optimizing black-box objectives
Mar 2022
Core Philosophy
Conjecture's signature approach contrasts sharply with mainstream AI development:
Approach
Philosophy
Methods
Evaluation
Prosaic Alignment
Train powerful LLMs, align post-hoc
RLHFCapabilityRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100, Constitutional AIApproachConstitutional AIConstitutional AI is Anthropic's methodology using explicit principles and AI-generated feedback (RLAIF) to train safer models, achieving 3-10x improvements in harmlessness while maintaining helpfu...Quality: 70/100
Behavioral testing
Cognitive Emulation
Build interpretable systems from ground up
Human cognition principles
Mechanistic understanding
Key Research Components
Mechanistic Interpretability
Circuit discovery in neural networks
Feature attribution and visualization
Scaling interpretability to larger models
Interpretability researchSafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 collaboration
Architecture Design
Modular systems for better control
Interpretability-first design choices
Trading capabilities for understanding
Novel training methodologies
Model Organisms
Smaller, interpretable test systems
Alignment property verification
Deception detection research
Goal representation analysis
Key Personnel
Key People
3 entries
Person
Title
Start
Is Founder
connor-leahy
CEO & Co-founder
Mar 2022
✓
sid-black
Co-founder
Mar 2022
✓
gabriel-alfour
Co-founder
Mar 2022
✓
Leadership Team
Connor Leahy
CEO and Co-founder
EleutherAI, autodidact ML researcher
Sid Black
Co-founder
EleutherAI technical researcher
Gabriel Alfour
CTO
Former Tezos CTO, systems engineering
Connor Leahy Profile
Aspect
Details
Background
EleutherAI collective member, GPT-J contributor
Evolution
From open-source advocacy to safety-focused research
Public Role
Active AI policy engagement, podcast appearances
Views
Short AI timelines, high P(doom), interpretability-necessary
Timeline Estimates: Leahy has consistently expressed short AI timelineConceptAGI TimelineComprehensive synthesis of AGI timeline forecasts showing dramatic acceleration: expert median dropped from 2061 (2018) to 2047 (2023), Metaculus from 50 years to 5 years since 2020, with current p...Quality: 59/100 views, suggesting AGI within years rather than decades.
Research Focus Areas
Mechanistic Interpretability
Research Area
Status
Key Questions
Circuit Analysis
Active
How do transformers implement reasoning?
Feature Extraction
Ongoing
What representations emerge in training?
Scaling Methods
Development
Can interpretability scale to AGI-level systems?
Goal Detection
Early
How can we detect goal-directedness mechanistically?
Comparative Advantages
Organization
Primary Focus
Interpretability Approach
Conjecture
CoEm, ground-up interpretability
Design-time interpretability
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100
Frontier models + interpretability
Post-hoc analysis of LLMs
ARCOrganizationAlignment Research CenterComprehensive reference page on ARC (Alignment Research Center), covering its evolution from a dual theory/evals organization to ARC Theory (3 permanent researchers) plus the METR spin-out (Decembe...Quality: 57/100
Theoretical alignment
Evaluation and ELK research
RedwoodOrganizationRedwood ResearchA nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark ali...Quality: 78/100
AI controlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100
Interpretability for control
Strategic Position
Theory of Change
Conjecture's pathway to AI safety impact:
Develop scalable interpretability techniques for powerful AI systems
Demonstrate CoEm viability as competitive alternative to black-box scaling
Influence field direction toward interpretability-first development
Inform governance with technical feasibility insights
Build safe systems using CoEm principles if successful
European AI Safety Hub
Role
Impact
Examples
Geographic Diversity
High
Alternative to Bay Area concentration
Policy Engagement
Growing
UK AISIOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100 consultation
Talent Development
Moderate
European researcher recruitment
Community Building
Early
Workshops and collaborations
Challenges and Criticisms
Technical Feasibility
Challenge
Severity
Status
CoEm Competitiveness
High
Unresolved - early stage
Interpretability Scaling
High
Active research question
Human Cognition Complexity
Medium
Ongoing investigation
Timeline Alignment
High
Critical if AGI timelinesConceptAGI TimelineComprehensive synthesis of AGI timeline forecasts showing dramatic acceleration: expert median dropped from 2061 (2018) to 2047 (2023), Metaculus from 50 years to 5 years since 2020, with current p...Quality: 59/100 short
Organizational Tensions
Commercial Pressure vs Safety Mission
VC funding creates return expectations
Potential future deployment pressure
Comparison to Anthropic's commercialization pathOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100
Publication Strategy Criticism
Shift from EleutherAI's radical openness
Selective research sharing decisions
Balance between transparency and safety
Current Research Outputs
Published Work
Type
Focus
Impact
Technical Papers
Interpretability methods
Research community
Blog Posts
CoEm explanations
Public understanding
Policy Contributions
Technical feasibility
Governance decisions
Open Source Tools
Interpretability software
Research ecosystem
Research Questions
Key Questions
?Can CoEm produce AI systems competitive with scaled LLMs?
?Is mechanistic interpretability sufficient for AGI safety verification?
?How will commercial pressures affect Conjecture's research direction?
?What role should interpretability play in AI governance frameworks?
?Can cognitive emulation bridge neuroscience and AI safety research?
?How does CoEm relate to other alignment approaches like Constitutional AI?
Timeline and Risk Estimates
Leadership Risk Assessments
Conjecture's leadership has articulated clear views on AI timelines and safety approaches, which fundamentally motivate their Cognitive Emulation research agenda and organizational strategy:
Expert/Source
Estimate
Reasoning
Connor LeahyPersonConnor LeahyBiography of Connor Leahy, CEO and co-founder of Conjecture, an AI safety company based in London. Previously co-founded EleutherAI in 2020, which produced GPT-J and GPT-NeoX. Leahy holds a high P(...Quality: 19/100
AGI: 2-10 years
Leahy has consistently expressed short AI timeline views across multiple public statements and podcasts from 2023-2024, suggesting transformative AI systems could emerge within years rather than decades. These short timelines create urgency for developing interpretability-first approaches before AGI arrives.
Connor Leahy
P(doom): High without major changes
Leahy has expressed significant concern about the default trajectory of AI development in 2023 statements, arguing that prosaic alignment approaches pursued by frontier labs are insufficient to ensure safety. This pessimism about conventional alignment motivates Conjecture's alternative CoEm approach.
Conjecture Research
Prosaic alignment: Insufficient
The organization's core research direction reflects a fundamental assessment that post-hoc alignment of large language models through techniques like RLHF and Constitutional AI cannot provide adequate safety guarantees. This view, maintained since founding, drives their pursuit of interpretability-first system design.
Organization
Interpretability: Necessary for safety
Conjecture's founding premise holds that mechanistic interpretability is not merely useful but necessary for AI safety verification. This fundamental research assumption distinguishes them from organizations pursuing behavioral safety approaches and shapes their entire technical agenda.
Future Scenarios
Research Trajectory Projections
Timeline
Optimistic
Realistic
Pessimistic
2-3 years
CoEm demonstrations, policy influence
Continued interpretability advances
Commercial pressure compromises
3-5 years
Competitive interpretable systems
Mixed results, partial success
Research agenda stagnates
5+ years
Field adoption of CoEm principles
Portfolio contribution to safety
Marginalized approach
Critical Dependencies
Factor
Importance
Uncertainty
Technical Feasibility
Critical
High - unproven at scale
Funding Continuity
High
Medium - VC expectations
AGI Timeline
Critical
High - if very short, insufficient time
Field Receptivity
Medium
Medium - depends on results
Relationships and Collaborations
Within AI Safety Ecosystem
Organization
Relationship
Collaboration Type
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100
Friendly competition
Interpretability research sharing
ARCOrganizationAlignment Research CenterComprehensive reference page on ARC (Alignment Research Center), covering its evolution from a dual theory/evals organization to ARC Theory (3 permanent researchers) plus the METR spin-out (Decembe...Quality: 57/100
Complementary
Different technical approaches
MIRIOrganizationMachine Intelligence Research InstituteComprehensive organizational history documenting MIRI's trajectory from pioneering AI safety research (2000-2020) to policy advocacy after acknowledging research failure, with detailed financial da...Quality: 50/100
Aligned concerns
Skepticism of prosaic alignment
Academic Labs
Collaborative
Interpretability technique development
Policy and Governance
UK Engagement
UK AI Safety InstituteOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100 consultation
LessWrongOrganizationLessWrongLessWrong is a rationality-focused community blog founded in 2009 that has influenced AI safety discourse, receiving \$5M+ in funding and serving as the origin point for ~31% of EA survey responden...Quality: 44/100 Posts↗✏️ blog★★★☆☆LessWrongLessWrong Postscognitive-emulationcoeminterpretabilitySource ↗
GovAIOrganizationGovAIGovAI is an AI policy research organization with ~15-20 staff, funded primarily by Coefficient Giving (\$1.8M+ in 2023-2024), that has trained 100+ governance researchers through fellowships and cu...Quality: 43/100 Analysis↗🏛️ government★★★★☆Centre for the Governance of AIGovAIA research organization focused on understanding AI's societal impacts, governance challenges, and policy implications across various domains like workforce, infrastructure, and...governanceagenticplanninggoal-stability+1Source ↗
Technical InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100
Why Alignment is HardArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100
AI AlignmentApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metrics for existing systems but face critical scalabi...Quality: 91/100 Forum↗✏️ blog★★★☆☆Alignment ForumAI Alignment Forumalignmenttalentfield-buildingcareer-transitions+1Source ↗
European AI Policy
UK AISIOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100
EU AI Office↗🔗 web★★★★☆European Union**EU AI Office**risk-factorcompetitiongame-theorycascades+1Source ↗
10GovAICentre for the Governance of AI·Government▸
A research organization focused on understanding AI's societal impacts, governance challenges, and policy implications across various domains like workforce, infrastructure, and public perception.
AI ControlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100
Approaches
AI AlignmentApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metrics for existing systems but face critical scalabi...Quality: 91/100AI EvaluationApproachAI EvaluationComprehensive overview of AI evaluation methods spanning dangerous capability assessment, safety properties, and deception detection, with categorized frameworks from industry (Anthropic Constituti...Quality: 72/100AI Safety Training ProgramsApproachAI Safety Training ProgramsComprehensive guide to AI safety training programs including MATS (78% alumni in alignment work, 100+ scholars annually), Anthropic Fellows (\$2,100/week stipend, 40%+ hired full-time), LASR Labs (...Quality: 70/100Constitutional AIApproachConstitutional AIConstitutional AI is Anthropic's methodology using explicit principles and AI-generated feedback (RLAIF) to train safer models, achieving 3-10x improvements in harmlessness while maintaining helpfu...Quality: 70/100
Analysis
Model Organisms of MisalignmentAnalysisModel Organisms of MisalignmentModel organisms of misalignment is a research agenda creating controlled AI systems exhibiting specific alignment failures as testbeds. Recent work achieves 99% coherence with 40% misalignment rate...Quality: 65/100Capability-Alignment Race ModelAnalysisCapability-Alignment Race ModelQuantifies the capability-alignment race showing capabilities currently ~3 years ahead of alignment readiness, with gap widening at 0.5 years/year driven by 10²⁶ FLOP scaling vs. 15% interpretabili...Quality: 62/100
Organizations
UK AI Safety InstituteOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100
Key Debates
Why Alignment Might Be HardArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100AI Alignment Research AgendasCruxAI Alignment Research AgendasComprehensive comparison of major AI safety research agendas (\$100M+ Anthropic, \$50M+ DeepMind, \$5-10M nonprofits) with detailed funding, team sizes, and failure mode coverage (25-65% per agenda...Quality: 69/100Technical AI Safety ResearchCruxTechnical AI Safety ResearchTechnical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent foundations, and robustness) with 500+ researchers and \...Quality: 66/100
Concepts
Large Language ModelsCapabilityLarge Language ModelsComprehensive analysis of LLM capabilities showing rapid progress from GPT-2 (1.5B parameters, 2019) to GPT-5 and Gemini 2.5 (2025), with training costs growing 2.4x annually and projected to excee...Quality: 60/100RLHFCapabilityRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100AGI TimelineConceptAGI TimelineComprehensive synthesis of AGI timeline forecasts showing dramatic acceleration: expert median dropped from 2061 (2018) to 2047 (2023), Metaculus from 50 years to 5 years since 2020, with current p...Quality: 59/100Safety Orgs OverviewSafety Orgs OverviewA well-organized reference overview of ~20 AI safety organizations categorized by function (alignment research, policy, field-building), with a comparative budget/headcount table showing estimated ...Quality: 48/100
Risks
AI ProliferationRiskAI ProliferationAI proliferation accelerated dramatically as the capability gap narrowed from 18 to 6 months (2022-2024), with open-source models like DeepSeek R1 now matching frontier performance. US export contr...Quality: 60/100Deceptive AlignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100
Other
Dario AmodeiPersonDario AmodeiComprehensive biographical profile of Anthropic CEO Dario Amodei documenting his competitive safety development philosophy, 10-25% catastrophic risk estimate, 2026-2030 AGI timeline, and Constituti...Quality: 41/100
Historical
Deep Learning Revolution EraHistoricalDeep Learning Revolution EraComprehensive timeline documenting 2012-2020 AI capability breakthroughs (AlexNet, AlphaGo, GPT-3) and parallel safety field development, with quantified metrics showing capabilities funding outpac...Quality: 44/100Mainstream EraHistoricalMainstream EraComprehensive timeline of AI safety's transition from niche to mainstream (2020-present), documenting ChatGPT's unprecedented growth (100M users in 2 months), the OpenAI governance crisis, and firs...Quality: 42/100