Longterm Wiki
Updated 2026-03-12HistoryData
Page StatusContent
Edited 1 day ago1.1k words47 backlinksUpdated every 6 weeksDue in 6 weeks
39QualityDraft •28ImportancePeripheral36ResearchLow
Summary

Comprehensive biography of Paul Christiano documenting his technical contributions (IDA, debate, scalable oversight), risk assessment (~10-20% P(doom), AGI 2030s-2040s), and evolution from higher optimism to current moderate concern. Documents implementation of his ideas at major labs (RLHF at OpenAI, Constitutional AI at Anthropic) with specific citation to papers and organizational impact.

Content7/13
LLM summaryScheduleEntityEdit historyOverview
Tables12/ ~4Diagrams0Int. links46/ ~9Ext. links0/ ~6Footnotes0/ ~3References14/ ~3Quotes0Accuracy0RatingsN:2 R:4.5 A:2 C:6Backlinks47
Issues1
QualityRated 39 but structure suggests 73 (underrated by 34 points)

Paul Christiano

Person

Paul Christiano

Comprehensive biography of Paul Christiano documenting his technical contributions (IDA, debate, scalable oversight), risk assessment (~10-20% P(doom), AGI 2030s-2040s), and evolution from higher optimism to current moderate concern. Documents implementation of his ideas at major labs (RLHF at OpenAI, Constitutional AI at Anthropic) with specific citation to papers and organizational impact.

RoleFounder
Known ForIterated amplification, AI safety via debate, scalable oversight
Related
Organizations
Alignment Research Center
Safety Agendas
Scalable Oversight
People
Eliezer YudkowskyJan Leike
1.1k words · 47 backlinks
Person

Paul Christiano

Comprehensive biography of Paul Christiano documenting his technical contributions (IDA, debate, scalable oversight), risk assessment (~10-20% P(doom), AGI 2030s-2040s), and evolution from higher optimism to current moderate concern. Documents implementation of his ideas at major labs (RLHF at OpenAI, Constitutional AI at Anthropic) with specific citation to papers and organizational impact.

RoleFounder
Known ForIterated amplification, AI safety via debate, scalable oversight
Related
Organizations
Alignment Research Center
Safety Agendas
Scalable Oversight
People
Eliezer YudkowskyJan Leike
1.1k words · 47 backlinks

Overview

Paul Christiano is one of the most influential researchers in AI alignment, known for developing concrete, empirically testable approaches to the alignment problem. With a PhD in theoretical computer science from UC Berkeley, he has worked at OpenAI, DeepMind, and founded the Alignment Research Center (ARC).

Christiano pioneered the "prosaic alignment" approach—aligning AI without requiring exotic theoretical breakthroughs. His current risk assessment places ~10-20% probability on existential risk from AI this century, with AGI arrival in the 2030s-2040s. His work has directly influenced alignment research programs at major labs including OpenAI, Anthropic, and DeepMind.

Risk Assessment

Risk FactorChristiano's AssessmentEvidence/ReasoningComparison to Field
P(doom)≈10-20%Alignment tractable but challengingModerate (vs 50%+ doomers, <5% optimists)
AGI Timeline2030s-2040sGradual capability increaseMainstream range
Alignment DifficultyHard but tractableIterative progress possibleMore optimistic than MIRI
Coordination FeasibilityModerately optimisticLabs have incentives to cooperateMore optimistic than average

Key Technical Contributions

Iterated Amplification and Distillation (IDA)

Published in "Supervising strong learners by amplifying weak experts" (2018):

ComponentDescriptionStatus
Human + AI CollaborationHuman overseer works with AI assistant on complex tasksTested at scale by OpenAI
DistillationExtract human+AI behavior into standalone AI systemStandard ML technique
IterationRepeat process with increasingly capable systemsTheoretical framework
BootstrappingBuild aligned AGI from aligned weak systemsCore theoretical hope

Key insight: If we can align a weak system and use it to help align slightly stronger systems, we can bootstrap to aligned AGI without solving the full problem directly.

AI Safety via Debate

Co-developed with Geoffrey Irving at DeepMind in "AI safety via debate" (2018):

MechanismImplementationResults
Adversarial TrainingTwo AIs argue for different positionsDeployed at Anthropic
Human JudgmentHuman evaluates which argument is more convincingScales human oversight capability
Truth DiscoveryDebate incentivizes finding flaws in opponent argumentsMixed empirical results
ScalabilityWorks even when AIs are smarter than humansTheoretical hope

Scalable Oversight Framework

Christiano's broader research program on supervising superhuman AI:

ProblemProposed SolutionCurrent Status
Task too complex for direct evaluationProcess-based feedback vs outcome evaluationImplemented at OpenAI
AI reasoning opaque to humansEliciting Latent Knowledge (ELK)Active research area
Deceptive alignmentRecursive reward modelingEarly stage research
Capability-alignment gapAssistance games frameworkTheoretical foundation

Intellectual Evolution and Current Views

Early Period (2016-2019)

  • Higher optimism: Alignment seemed more tractable
  • IDA focus: Believed iterative amplification could solve core problems
  • Less doom: Lower estimates of catastrophic risk

Current Period (2020-Present)

ShiftFromToEvidence
Risk assessment≈5% P(doom)≈10-20% P(doom)"What failure looks like"
Research focusIDA/DebateEliciting Latent KnowledgeARC's ELK report
Governance viewsLab-focusedBroader coordinationRecent policy writings
TimelinesLongerShorter (2030s-2040s)Following capability advances

Strategic Disagreements in the Field

Can we learn alignment iteratively?

Paul ChristianoYes, alignment tax should be acceptable, we can catch problems in weaker systems

Prosaic alignment through iterative improvement

Confidence: medium-high
Eliezer YudkowskyNo, sharp capability jumps mean we won't get useful feedback

Deceptive alignment, treacherous turns, alignment is anti-natural

Confidence: high
Jan LeikeYes, but we need to move fast as capabilities advance rapidly

Similar to Paul but more urgency given current pace

Confidence: medium

Core Crux Positions

IssueChristiano's ViewAlternative ViewsImplication
Alignment difficultyProsaic solutions sufficientNeed fundamental breakthroughs (MIRI)Different research priorities
Takeoff speedsGradual, time to iterateFast, little warningDifferent preparation strategies
Coordination feasibilityModerately optimisticPessimistic (racing dynamics)Different governance approaches
Current system alignmentMeaningful progress possibleCurrent systems too limitedDifferent research timing

Research Influence and Impact

Direct Implementation

TechniqueOrganizationImplementationResults
RLHFOpenAIInstructGPT, ChatGPTMassive improvement in helpfulness
Constitutional AIAnthropicClaude trainingReduced harmful outputs
Debate methodsDeepMindSparrowMixed results on truthfulness
Process supervisionOpenAIMath reasoningBetter than outcome supervision

Intellectual Leadership

  • AI Alignment Forum: Primary venue for technical alignment discourse
  • Mentorship: Trained researchers now at major labs (Jan Leike, Geoffrey Irving, others)
  • Problem formulation: ELK problem now central focus across field

Current Research Agenda (2024)

At ARC, Christiano's priorities include:

Research AreaSpecific FocusTimeline
Power-seeking evaluationUnderstanding how AI systems could gain influence graduallyOngoing
Scalable oversightBetter techniques for supervising superhuman systemsCore program
Alignment evaluationMetrics for measuring alignment progressNear-term
Governance researchCoordination mechanisms between labsPolicy-relevant

Key Uncertainties and Cruxes

Christiano identifies several critical uncertainties:

UncertaintyWhy It MattersCurrent Evidence
Deceptive alignment prevalenceDetermines safety of iterative approachMixed signals from current systems
Capability jump sizesAffects whether we get warningContinuous but accelerating progress
Coordination feasibilityDetermines governance strategiesSome positive signs
Alignment tax magnitudeEconomic feasibility of safetyEarly evidence suggests low tax

Timeline and Trajectory Assessment

Near-term (2024-2027)

  • Continued capability advances in language models
  • Better alignment evaluation methods
  • Industry coordination on safety standards

Medium-term (2027-2032)

  • Early agentic AI systems
  • Critical tests of scalable oversight
  • Potential governance frameworks

Long-term (2032-2040)

  • Approach to transformative AI
  • Make-or-break period for alignment
  • International coordination becomes crucial

Comparison with Other Researchers

ResearcherP(doom)TimelineAlignment ApproachCoordination View
Paul Christiano≈15%2030sProsaic, iterativeModerately optimistic
Eliezer Yudkowsky≈90%2020sFundamental theoryPessimistic
Dario Amodei≈10-25%2030sConstitutional AIIndustry-focused
Stuart Russell≈20%2030sProvable safetyGovernance-focused

Sources & Resources

Key Publications

PublicationYearVenueImpact
Supervising strong learners by amplifying weak experts2018NeurIPSFoundation for IDA
AI safety via debate2018arXivDebate framework
What failure looks like2019AFRisk assessment update
Eliciting Latent Knowledge2021ARCCurrent research focus
CategoryLinks
Research OrganizationAlignment Research Center
Blog/WritingAI Alignment Forum, Personal blog
AcademicGoogle Scholar
SocialTwitter
AreaConnection to Christiano's Work
Scalable oversightCore research focus
Reward modelingFoundation for many proposals
AI governanceIncreasing focus area
Alignment evaluationCritical for iterative approach

References

1Iterated Distillation and AmplificationarXiv·Paul Christiano, Buck Shlegeris & Dario Amodei·2018·Paper
★★★☆☆
2OpenAIOpenAI
★★★★☆
3Geoffrey IrvingGoogle Scholar
★★★★☆
4Debate as Scalable OversightarXiv·Geoffrey Irving, Paul Christiano & Dario Amodei·2018·Paper
★★★☆☆
5Anthropic'sAnthropic
★★★★☆
★★★★☆
7What Failure Looks LikeAlignment Forum·paulfchristiano·2019·Blog post
★★★☆☆
9AI Alignment ForumAlignment Forum·Blog post
★★★☆☆
11AI Alignment ForumAlignment Forum·Blog post
★★★☆☆
13Google ScholarGoogle Scholar
★★★★☆
14Twittertwitter.com

Structured Data

12 facts·3 recordsView full profile →
Employed By
US AI Safety Institute
as of Feb 2024
Role / Title
Head of AI Safety, US AI Safety Institute
as of Feb 2024
Birth Year
1,992

All Facts

People
PropertyValueAs OfSource
Role / TitleHead of AI Safety, US AI Safety InstituteFeb 2024
1 earlier value
Oct 2021Founder, Alignment Research Center
Employed ByUS AI Safety InstituteFeb 2024
2 earlier values
Oct 2021Alignment Research Center
Jan 2017OpenAI
Biographical
PropertyValueAs OfSource
Birth Year1,992
EducationPhD in Computer Science, UC Berkeley; BS in Mathematics, MIT
Notable ForPioneer of RLHF and AI alignment research; founder of Alignment Research Center (ARC); key theorist of iterated amplification and eliciting latent knowledge
Social Media@paulfchristiano
Wikipediahttps://en.wikipedia.org/wiki/Paul_Christiano
Google Scholarhttps://scholar.google.com/citations?user=6gHkYDgAAAAJ
General
PropertyValueAs OfSource
Websitehttps://paulfchristiano.com

Career History

3
OrganizationTitleStartEnd
openaiResearch Scientist2017Oct 2021
arcFounder & PresidentOct 2021
us-aisiHead of AI SafetyFeb 2024

Related Pages

Top Related Pages

Organizations

METRNIST and AI Safety

Analysis

Model Organisms of MisalignmentCapability-Alignment Race Model

Approaches

AI AlignmentAI Safety via Debate

Concepts

AI TimelinesExistential Risk from AIAgentic AILarge Language Models

Risks

Deceptive AlignmentAI Development Racing Dynamics

Key Debates

AI Alignment Research AgendasAI Accident Risk CruxesThe Case For AI Existential RiskWhy Alignment Might Be Hard

Policy

Voluntary AI Safety Commitments

Safety Research

AI Control

Historical

Deep Learning Revolution EraThe MIRI Era