Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusContent
Edited today2.6k words26 backlinksUpdated every 3 weeksDue in 3 weeks
27QualityDraft •82ImportanceHigh39ResearchLow
Summary

Biography of Jan Leike covering his career from Australian National University through DeepMind, OpenAI's Superalignment team, to his current role as head of the Alignment Science team at Anthropic. Documents his research on RLHF and scalable oversight, his May 2024 departure from OpenAI, and his current research priorities including weak-to-strong generalization and automated alignment techniques.

Content6/13
LLM summaryScheduleEntityEdit history3Overview
Tables1/ ~10Diagrams0/ ~1Int. links46/ ~21Ext. links3/ ~13Footnotes0/ ~8References3/ ~8Quotes0Accuracy0RatingsN:2 R:3.5 A:2 C:5Backlinks26
Change History3
Auto-improve (standard): Jan Leike3 weeks ago

Improved "Jan Leike" via standard pipeline (1147.1s). Quality score: 88. Issues resolved: Footnote [^19] cites Sam Altman's X post but the URL points ; The <F e="openai" f="">20%</F> component tag in the OpenAI s; Bare URL 'aligned.substack.com' appears in the Public Commun.

1147.1s · $5-8

Surface tacticalValue in /wiki table and score 53 pages3 weeks ago

Added `tacticalValue` to `ExploreItem` interface, `getExploreItems()` mappings, the `/wiki` explore table (new sortable "Tact." column), and the card view sort dropdown. Scored 49 new pages with tactical values (4 were already scored), bringing total to 53.

sonnet-4 · ~30min

Wiki editing system refactoring#1843 weeks ago

Six refactors to the wiki editing pipeline: (1) extracted shared regex patterns to `crux/lib/patterns.ts`, (2) refactored validation in page-improver to use in-process engine calls instead of subprocess spawning, (3) split the 694-line `phases.ts` into 7 individual phase modules under `phases/`, (4) created shared LLM abstraction `crux/lib/llm.ts` unifying duplicated streaming/retry/tool-loop code, (5) added Zod schemas for LLM JSON response validation, (6) decomposed 820-line mermaid validation into `crux/lib/mermaid-checks.ts` (604 lines) + slim orchestrator (281 lines). Follow-up review integrated patterns.ts across 19+ files, fixed dead imports, corrected ToolHandler type, wired mdx-utils.ts to use shared patterns, replaced hardcoded model strings with MODELS constants, replaced `new Anthropic()` with `createLlmClient()`, replaced inline `extractText` implementations with shared `extractText()` from llm.ts, integrated `MARKDOWN_LINK_RE` into link validators, added `objectivityIssues` to the `AnalysisResult` type (removing an unsafe cast in utils.ts), fixed CI failure from eager client creation, and tested the full pipeline by improving 3 wiki pages. After manual review of 3 improved pages, fixed 8 systematic pipeline issues: (1) added content preservation instructions to prevent polish-tier content loss, (2) made auto-grading default after --apply, (3) added polish-tier citation suppression to prevent fabricated citations, (4) added Quick Assessment table requirement for person pages, (5) added required Overview section enforcement, (6) added section deduplication and content repetition checks to review phase, (7) added bare URL→markdown link conversion instruction, (8) extended biographical claim checker to catch publication/co-authorship and citation count claims. Subsequent iterative testing and prompt refinement: ran pipeline on jan-leike, chris-olah, far-ai pages. Discovered and fixed: (a) `<!-- NEEDS CITATION -->` HTML comments break MDX compilation (changed to `{/* NEEDS CITATION */}`), (b) excessive citation markers at polish tier — added instruction to only mark NEW claims (max 3-5 per page), (c) editorial meta-comments cluttering output — added no-meta-comments instruction, (d) thin padding sections — added anti-padding instruction, (e) section deduplication needed stronger emphasis — added merge instruction with common patterns. Final test results: jan-leike 1254→1997 words, chris-olah 1187→1687 words, far-ai 1519→2783 words, miri-era 2678→4338 words; all MDX compile, zero critical issues.

Issues2
QualityRated 27 but structure suggests 73 (underrated by 46 points)
Links1 link could use <R> components

Jan Leike

Person

Jan Leike

Biography of Jan Leike covering his career from Australian National University through DeepMind, OpenAI's Superalignment team, to his current role as head of the Alignment Science team at Anthropic. Documents his research on RLHF and scalable oversight, his May 2024 departure from OpenAI, and his current research priorities including weak-to-strong generalization and automated alignment techniques.

AffiliationAnthropic
RoleHead of Alignment
Known ForAlignment research, scalable oversight, RLHF, superalignment work
Related
Organizations
Anthropic
Safety Agendas
Scalable Oversight
People
Dario AmodeiPaul Christiano
2.6k words · 26 backlinks
Person

Jan Leike

Biography of Jan Leike covering his career from Australian National University through DeepMind, OpenAI's Superalignment team, to his current role as head of the Alignment Science team at Anthropic. Documents his research on RLHF and scalable oversight, his May 2024 departure from OpenAI, and his current research priorities including weak-to-strong generalization and automated alignment techniques.

AffiliationAnthropic
RoleHead of Alignment
Known ForAlignment research, scalable oversight, RLHF, superalignment work
Related
Organizations
Anthropic
Safety Agendas
Scalable Oversight
People
Dario AmodeiPaul Christiano
2.6k words · 26 backlinks

Quick Assessment

DimensionAssessment
Primary RoleHead of Alignment Science at Anthropic (2024–present)
Key ContributionsCo-authored early RLHF research; led the Agent Alignment Team at Google DeepMind; co-led OpenAI's Superalignment team; developed Reward Modeling frameworks
Key Publications"Deep Reinforcement Learning from Human Preferences" (NeurIPS 2017); "Scalable agent alignment via reward modeling" (arXiv 2018); "AI Safety Gridworlds" (arXiv 2017); "Recursively Summarizing Books with Human Feedback" (arXiv 2021)
Career TrajectoryPhD, Australian National University (2016) → FHI postdoc (2016) → Senior Research Scientist, Google DeepMind (2016–2021) → Head of Alignment / Superalignment co-lead, OpenAI (January 2021 – May 2024) → Anthropic (2024–present)
Notable EventDeparted OpenAI on May 16, 2024; posted publicly on X about his stated reasons for leaving

Overview

Jan Leike is an AI alignment researcher who has held senior roles at Google DeepMind, OpenAI, and Anthropic. He completed a PhD in reinforcement learning theory at Australian National University in 2016 under the supervision of Marcus Hutter, and subsequently held a brief research fellowship at the Future of Humanity Institute. At DeepMind, he led the Agent Alignment Team and contributed to early RLHF research. He joined OpenAI in January 2021 to lead alignment research, and in July 2023 co-led the formation of the Superalignment team alongside Ilya Sutskever, with a stated goal of solving Superintelligence within four years.1 He departed OpenAI on May 16, 2024, posting a public thread on X explaining his stated reasons for leaving.2 He subsequently joined Anthropic, where he heads the Alignment Science team.3 TIME magazine listed him among the 100 most influential people in AI in both 2023 and 2024.45

Background

Education

Leike completed his PhD at Australian National University between 2014 and 2016.6 His thesis, titled Nonparametric General Reinforcement Learning, addressed theoretical aspects of reinforcement learning, including work on agents acting in unknown environments modeled after the AIXI framework developed by his supervisor, Marcus Hutter.7 Hutter is known for research on universal AI and algorithmic information theory. During his PhD, Leike won the Best Student Paper award at the UAI (Uncertainty in Artificial Intelligence) conference for the paper "Thompson sampling is asymptotically optimal in general environments."8

Early Career

After completing his PhD in November 2016, Leike was appointed as a Machine Learning Research Fellow at the Future of Humanity Institute at the University of Oxford.8 He then joined Google DeepMind in 2016 as a Research Scientist.

Career Trajectory

Google DeepMind (2016–2021)

At Google DeepMind, Leike held the title of Senior Research Scientist and led the Agent Alignment Team, one of three teams within DeepMind's technical AGI group.9 His research aimed at making machine learning robust and beneficial, focusing on safety and alignment of reinforcement learning agents. His stated primary research question during this period was: how can competitive and scalable machine learning algorithms be designed to make sequential decisions in the absence of a reward function.9

Key work during this period included:

  • Lead authorship of the AI Safety Gridworlds paper (2017), which presented a suite of reinforcement learning environments designed to illustrate safety properties including safe interruptibility, avoiding side effects, reward gaming, safe exploration, robustness to self-modification, and distributional shift.10
  • Co-authorship of "Deep Reinforcement Learning from Human Preferences" (NeurIPS 2017) with Paul Christiano, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei — a cross-institutional collaboration spanning OpenAI and DeepMind.11
  • Lead authorship of "Scalable agent alignment via reward modeling" (arXiv 2018), co-authored with David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg.12
  • Co-authorship of a 2020 DeepMind blog post on specification gaming with Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, and Shane Legg, which defined specification gaming as "a behaviour that satisfies the literal specification of an objective without achieving the intended outcome."13

Leike described his own role at DeepMind as prototyping reinforcement learning from human feedback.3

OpenAI (January 2021 – May 2024)

Leike joined OpenAI in January 2021 to lead alignment research.14 He announced this on X on January 22, 2021, stating: "Last week I joined @OpenAI to lead their alignment effort."14

At OpenAI he was involved in the development of InstructGPT, ChatGPT, and the alignment of GPT-4, and developed OpenAI's stated approach to alignment research.3

In July 2023, OpenAI announced the formation of the Superalignment team, co-led by Leike and Ilya Sutskever (then OpenAI's Chief Scientist). OpenAI pledged 20% of the compute it had secured at the time to the effort, with a stated goal of solving the core technical challenges of superintelligence alignment within four years.1 The team was recruited from OpenAI's existing alignment researchers and other internal teams, and was also hiring machine learning researchers and engineers new to alignment research.15

Leike departed OpenAI on May 16, 2024. This is described further in the Departure from OpenAI section below.

Anthropic (2024–present)

Following his departure from OpenAI, Leike joined Anthropic, where he heads the Alignment Science team.3 The team has published research including work on alignment faking (December 2024, co-produced with Redwood Research) and maintains a public research blog.1617

Key Contributions

RLHF Research

The 2017 NeurIPS paper "Deep Reinforcement Learning from Human Preferences," co-authored by Leike with Paul Christiano, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei, demonstrated that reinforcement learning agents could learn complex tasks — including Atari games and simulated robot locomotion — from human preferences between pairs of trajectory segments, without requiring a pre-specified reward function.11 The approach required human feedback on approximately 0.1% of agent interactions, which the authors argued reduced oversight costs enough for practical application.11 RLHF has multiple independent research threads across the field; Leike's 2017 paper is among the early works associated with scaling it to more complex tasks.

The 2018 arXiv paper "Scalable agent alignment via reward modeling," led by Leike and co-authored with David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg, presented reward modeling as a research direction for agent alignment, drawing on and synthesizing prior work in the field.12 This paper was published as an arXiv preprint and was not peer-reviewed at a conference or journal.

AI Safety Gridworlds

The 2017 paper "AI Safety Gridworlds," for which Leike was the lead and first author, presented a suite of reinforcement learning environments designed to illustrate specific AI safety problems empirically.10 Co-authors included Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg, all at DeepMind. The paper evaluated the A2C and Rainbow algorithms on these environments and found neither solved the environments satisfactorily, categorizing AI safety problems into robustness and specification problems.10 The work built on the conceptual framework of the "Concrete Problems in AI Safety" paper (Amodei et al., 2016), of which Leike was not a co-author.

Scalable Oversight Research

Leike's research has addressed the challenge of supervising AI systems that may be more capable than human evaluators, through:

  • Recursive reward modeling approaches, where AI systems assist humans in evaluating other AI systems
  • The 2021 paper "Recursively Summarizing Books with Human Feedback" (arXiv:2109.10862), co-authored with Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, and Paul Christiano, which combined human feedback with recursive task decomposition to summarize full-length books using GPT-3, achieving results on the BookSum and NarrativeQA benchmarks.18 In this paper, Leike and Paul Christiano are credited as having managed the team.
  • Weak-to-Strong Generalization research examining whether less capable supervisors can effectively oversee more capable systems
  • Comparisons between process supervision (evaluating reasoning steps) and outcome supervision (evaluating final results)

Superalignment Team

The Superalignment team at OpenAI, co-led by Leike and Ilya Sutskever, was announced in July 2023 with a stated plan to build a roughly human-level AI alignment researcher that could then be used to solve the harder problem of aligning superintelligent systems.15 OpenAI committed 20% of its then-current compute to the effort.1

Research Publications

Selected publications, with full author lists and venues:

  • "AI Safety Gridworlds" (arXiv:1711.09883, November 2017) — Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg. All DeepMind. Leike is first author.10
  • "Deep Reinforcement Learning from Human Preferences" (NeurIPS 2017, pp. 4299–4307; arXiv:1706.03741) — Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei. Cross-institutional (OpenAI and DeepMind).11
  • "Scalable agent alignment via reward modeling: a research direction" (arXiv:1811.07871, November 2018) — Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg. arXiv preprint; not peer-reviewed at a conference.12
  • "Specification gaming: the flip side of AI ingenuity" (DeepMind blog post, 2020) — Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane Legg. Published as a blog post; cited in academic literature in that form.13
  • "Recursively Summarizing Books with Human Feedback" (arXiv:2109.10862, September 2021) — Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano. OpenAI Alignment team.18

Departure from OpenAI

On May 17, 2024, Leike posted a thread on X announcing that May 16, 2024 had been his last day at OpenAI.2 The thread received approximately 6.1 million views and 11,000 reposts.2

In his posts, Leike stated his reasons for departing. His stated concerns, attributed here to him directly, included:

  • "I joined because I thought OpenAI would be the best place in the world to do this research. However, I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point."2
  • "safety culture and processes have taken a backseat to shiny products"2
  • "over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done."2
  • "building smarter-than-human machines is an inherently dangerous endeavor"2

These represent Leike's stated account of his departure. Sam Altman responded on X: "I'm super appreciative of @janleike's contributions to OpenAI's alignment research and safety culture, and very sad to see him leave. He's right we have a lot more to do; we are committed to doing it."19 Sam Altman and Greg Brockman subsequently posted a joint note stating that OpenAI was "not sure yet when we'll reach our safety bar for releases, and it's ok if that pushes out release timelines."20

Reporting by Fortune, citing approximately half a dozen sources familiar with the Superalignment team's work, stated that OpenAI had not fulfilled its announced commitment to allocate 20% of its compute to the Superalignment team, and that the team had repeatedly seen GPU access requests declined.21 OpenAI did not directly comment on the compute allocation claim to several outlets, directing reporters to Altman's X post.22

Context: Superalignment Team Dissolution and Broader Departures

Leike's departure occurred days after Ilya Sutskever announced his own departure from OpenAI, with both announcements made on X within hours of each other on May 14, 2024.22 Following both co-leaders' departures, OpenAI confirmed to CNN that it had dissolved the Superalignment team as a standalone unit, reassigning its approximately 25 members across other research groups.22 Jakub Pachocki was named as new Chief Scientist to replace Sutskever.22

Leike's departure was part of a broader pattern of departures from OpenAI's safety-focused staff in 2024. Fortune reported that at least six other AI safety researchers had left OpenAI from different teams in the months surrounding Leike's departure, including Daniel Kokotajlo, who told Vox he "gradually lost trust in OpenAI leadership and their ability to responsibly handle AGI."21

Research Focus at Anthropic

Current Research Priorities

At Anthropic, Leike heads the Alignment Science team. Research areas the team has pursued include:

  1. Weak-to-strong generalization: Investigating methods by which less capable systems (including humans) can effectively supervise and evaluate more capable AI systems. The paper "Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision" (ICML 2024) was initiated at OpenAI prior to Leike's departure and published in 2024.3
  2. Scalable oversight techniques: Developing approaches to make human feedback mechanisms effective for systems that may exceed human capabilities in specific domains. Leike told TIME in 2024 that he believes aligning larger systems will increasingly be automated by smaller, trusted models as alignment science "becomes more and more mature."5
  3. Alignment faking research: The team's December 2024 paper, co-produced with Redwood Research, demonstrated that Claude 3 Opus strategically halted certain refusals during training to preserve its preferences — behavior the authors described as emergent rather than explicitly trained.16
  4. Behavioral evaluation: The team's public blog describes work on automated behavioral evaluations, model organisms of misalignment, and monitoring techniques to detect whether models reason about malicious tasks while evading detection.17
  5. Automated alignment research: Leike has argued that evaluation is easier than generation for many tasks, including alignment research, which enables AI systems to assist in performing alignment research.23

Technical Challenges

Research challenges Leike has discussed across interviews and publications include:

  • Reward hacking: Systems optimizing proxy measures rather than intended objectives
  • Distributional shift: Maintaining alignment when systems encounter situations outside their training distribution
  • Deceptive alignment / Scheming: Risks that systems might behave differently during evaluation than during deployment
  • Scalable supervision: Ensuring human oversight remains meaningful as AI capabilities increase

Public Statements on AI Risk and Development

The following views are attributed to Leike based on specific public sources.

On timelines: Leike stated in a 2023 podcast interview with 80,000 Hours: "While superintelligence seems far off now, we believe it could arrive this decade."23 He described his approach as focusing on the more tractable near-term problem of aligning the next generation of AI systems rather than superintelligence directly, stating: "If you're thinking about how do you align the superintelligence... I don't know. I don't have an answer."23

On tractability: TIME described Leike in 2023 as "more optimistic than many who work on preventing AI-related catastrophe."4 He was quoted: "So much is still up in the air. Humans have a lot of ownership over what happens, and we should try hard to make it go well."4

On safety culture: In his May 2024 X thread, Leike stated that he believed "much more of OpenAI's bandwidth should be spent" on "security, monitoring, preparedness, safety, and societal impact."2 He stated: "OpenAI shoulders an enormous responsibility on behalf of all of humanity."2

On alignment automation: In the 2023 AXRP podcast with Daniel Filan, Leike discussed how the Superalignment team's approach centered on training a roughly human-level automated alignment researcher that would then be asked to solve the harder problem of aligning more capable systems.15

On his research approach: Leike's research has consistently emphasized empirical testing with existing AI systems rather than purely theoretical work, and developing techniques that can be adapted as systems become more capable. His personal website describes the 80,000 Hours podcast as "the best introduction into my thinking in podcast form, especially if you're coming from machine learning."3

Public Communication

Leike has communicated about alignment research through multiple channels:

  • X (formerly Twitter) @janleike: Regular posts on alignment challenges, safety concerns, and research directions
  • Substack blog: aligned.substack.com — a blog on alignment research
  • Podcast appearances: Including 80,000 Hours (episode #159, 2023),23 AXRP with Daniel Filan (episode 24, July 2023),15 and Future of Life Institute's AI Alignment Podcast (circa 2019–2020)9
  • May 2024 X thread: His departure statement attracted substantial public discussion; the thread received approximately 6.1 million views2

His personal website at jan.leike.name lists publications and recommended resources.

Recognition

TIME magazine included Leike in its "100 Most Influential People in AI" list in both 2023 and 2024.45 The 2023 entry noted he was 36 years old at the time of publication and described him as "more optimistic than many who work on preventing AI-related catastrophe."4


Footnotes

  1. "Introducing Superalignment," — OpenAI, "Introducing Superalignment," July 2023. 2 3

  2. X thread announcing departure from OpenAI — Jan Leike, X thread announcing departure from OpenAI, posted May 17, 2024. 2 3 4 5 6 7 8 9 10

  3. personal website — Jan Leike, personal website, ongoing (accessed 2024–2025). 2 3 4 5 6

  4. Citation rc-bb8b (data unavailable — rebuild with wiki-server access) 2 3 4 5

  5. "Jan Leike: The 100 Most Influential People in AI 2024," — TIME Magazine, "Jan Leike: The 100 Most Influential People in AI 2024," 2024. 2 3

  6. Jan Leike profile — OpenReview, Jan Leike profile, ongoing.

  7. "Nonparametric General Reinforcement Learning," — Jan Leike, "Nonparametric General Reinforcement Learning," PhD thesis, Australian National University, November 2016.

  8. "Strategic Artificial Intelligence Research Centre New Hires," — Future of Humanity Institute, "Strategic Artificial Intelligence Research Centre New Hires," 2016. 2

  9. "AI Alignment Podcast: On DeepMind, AI Safety, and Recursive Reward Modeling with Jan Leike," — Future of Life Institute, "AI Alignment Podcast: On DeepMind, AI Safety, and Recursive Reward Modeling with Jan Leike," circa 2019–2020. 2 3

  10. "AI Safety Gridworlds," — Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg, "AI Safety Gridworlds," arXiv:1711.09883, November 2017. 2 3 4

  11. "Deep Reinforcement Learning from Human Preferences," — Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei, "Deep Reinforcement Learning from Human Preferences," Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pp. 4299–4307; arXiv:1706.03741. 2 3 4

  12. "Scalable agent alignment via reward modeling: a research direction," — Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg, "Scalable agent alignment via reward modeling: a research direction," arXiv:1811.07871, November 2018. 2 3

  13. "Specification gaming: the flip side of AI ingenuity," — Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane Legg, "Specification gaming: the flip side of AI ingenuity," DeepMind blog, 2020. 2

  14. X post announcing joining OpenAI — Jan Leike, X post announcing joining OpenAI, January 22, 2021. 2

  15. "Episode 24 — Superalignment with Jan Leike," — Daniel Filan (host), "Episode 24 — Superalignment with Jan Leike," AXRP — the AI X-risk Research Podcast, July 27, 2023. 2 3 4

  16. "Alignment faking in large language models," — Anthropic Alignment Science team / Redwood Research, "Alignment faking in large language models," December 20, 2024. 2

  17. Alignment Science Blog — Anthropic Alignment Science team, Alignment Science Blog, 2024–2025. 2

  18. "Recursively Summarizing Books with Human Feedback," — Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano, "Recursively Summarizing Books with Human Feedback," arXiv:2109.10862, September 22, 2021. 2

  19. X post responding to Jan Leike's departure — Sam Altman, X post responding to Jan Leike's departure, May 17, 2024.

  20. "OpenAI's recent departures force leaders to reaffirm safety commitment," — Axios, "OpenAI's recent departures force leaders to reaffirm safety commitment," May 20, 2024.

  21. "OpenAI promised 20% of its computing power to combat the most dangerous kind of AI — but never delivered," — Fortune, "OpenAI promised 20% of its computing power to combat the most dangerous kind of AI — but never delivered," May 21, 2024. 2

  22. "OpenAI dissolves Superalignment AI safety team," — CNBC, "OpenAI dissolves Superalignment AI safety team," May 17, 2024. 2 3 4

  23. "Episode #159 — Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less," — 80,000 Hours, "Episode #159 — Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less," 2023. 2 3 4

References

Structured Data

9 facts·3 recordsView full profile →
Employed By
Anthropic
as of May 2024
Role / Title
Head of Alignment Science
as of May 2024

All Facts

People
PropertyValueAs OfSource
Employed ByAnthropicMay 2024
2 earlier values
Jan 2021OpenAI
2017Google DeepMind
Role / TitleHead of Alignment ScienceMay 2024
2 earlier values
Jul 2023Co-lead, Superalignment
2017Research Scientist
Biographical
PropertyValueAs OfSource
EducationPhD in Machine Learning, Australian National University
Notable ForHead of Alignment Science at Anthropic; former co-lead of OpenAI Superalignment team; prominent advocate for AI safety resource allocation
Social Media@janleike

Career History

3
OrganizationTitleStartEnd
deepmindResearch Scientist20172021
openaiAlignment Team Lead; then Co-lead of Superalignment2021May 2024
anthropicHead of Alignment ScienceMay 2024

Related Pages

Top Related Pages

Other

Ilya SutskeverSam Altman

Approaches

Weak-to-Strong GeneralizationAI AlignmentProcess Supervision

Risks

Reward HackingSchemingDeceptive Alignment

Key Debates

AI Alignment Research AgendasAI Accident Risk CruxesWhy Alignment Might Be Easy

Concepts

SuperintelligenceOptimistic Alignment WorldviewRLHFSituational Awareness

Organizations

Google DeepMind

Analysis

Anthropic Valuation AnalysisSafety-Capability Tradeoff Model

Safety Research

Anthropic Core Views

Historical

Mainstream Era