Longterm Wiki
Back

Personal website

web
jan.leike.name·jan.leike.name/

Data Status

Not fetched

Cited by 3 pages

PageTypeQuality
Jan LeikePerson27.0
Scalable OversightSafety Agenda68.0
Optimistic Alignment WorldviewConcept91.0

Cached Content Preview

HTTP 200Fetched Feb 23, 20264 KB
# Jan Leike

Machine learning & alignment researcher

_Optimizing for a post-AGI future where humanity flourishes_

[jan@anthropic.com](mailto:Jan%20Leike%20%3Cjan%40anthropic.com%3E)

[@janleike](https://twitter.com/janleike)

[blog](https://aligned.substack.com/)

[publications](https://jan.leike.name/publications.html)

## About me

I lead the Alignment Science team at Anthropic.
Previously, I co-led the [Superalignment\\
Team](https://openai.com/blog/introducing-superalignment) at OpenAI, where I’ve been involved in the development of [InstructGPT](https://openai.com/research/instruction-following), [ChatGPT](https://openai.com/blog/chatgpt), and the alignment of [GPT-4](https://openai.com/research/gpt-4).
I developed OpenAI’s [approach to alignment research](https://openai.com/blog/our-approach-to-alignment-research/) and co-authored the Superalignment Team’s research roadmap.
Prior to OpenAI, I was an alignment researcher at DeepMind where I prototyped [reinforcement learning from human feedback](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback).
I hold a PhD in reinforcement learning theory from the Australian National University.
In [2023](https://time.com/collection/time100-ai/6310616/jan-leike/) and [2024](https://time.com/7012867/jan-leike/) TIME magazine listed me as one of the 100 most influential people in AI.

## My Research

My research aims to solve
[the hard problem of alignment](https://aligned.substack.com/p/what-is-alignment):

_How can we train AI systems to follow human intent on tasks that are difficult for humans to evaluate directly?_

My team at Anthropic is researching how to align an [automated alignment researcher](https://aligned.substack.com/p/alignment-mvp), working on [scalable oversight](https://aligned.substack.com/p/ai-assisted-human-feedback), [weak-to-strong generalization](https://openai.com/index/weak-to-strong-generalization/), and robustness to jailbreaks.

Read more:

- [The podcast interview with 80,000 hours](https://80000hours.org/podcast/episodes/jan-leike-superalignment/) ( [video](https://www.youtube.com/watch?v=ZP_N4q5U3eE)) is currently the best introduction into my thinking in podcast form, especially if you’re coming from machine learning.
- [The podcast interview with Daniel Filan](https://axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html) goes into more technical questions, and should be interesting for those already somewhat familiar with alignment research.
- [My blog about alignment research](https://aligned.substack.com/).

## Selected Publications

- **[LLM Critics Help Catch LLM Bugs](https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/)**


Nat McAleese, Rai Michael Pokorny, Juan Felipe Cerón Uribe, Evgenia Nitishinskaya, Maja Trebacz, Jan Leike. 2024.
- **[Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision](https://openai.com/index/weak-to-strong-generalization/)**


Collin Burns, Pavel Izmailov, Jan He

... (truncated, 4 KB total)
Resource ID: 2a84eb0982d4de6a | Stable ID: NWNjNGViNm