Back
Personal website
webjan.leike.name·jan.leike.name/
Data Status
Not fetched
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Jan Leike | Person | 27.0 |
| Scalable Oversight | Safety Agenda | 68.0 |
| Optimistic Alignment Worldview | Concept | 91.0 |
Cached Content Preview
HTTP 200Fetched Feb 23, 20264 KB
# Jan Leike
Machine learning & alignment researcher
_Optimizing for a post-AGI future where humanity flourishes_
[jan@anthropic.com](mailto:Jan%20Leike%20%3Cjan%40anthropic.com%3E)
[@janleike](https://twitter.com/janleike)
[blog](https://aligned.substack.com/)
[publications](https://jan.leike.name/publications.html)
## About me
I lead the Alignment Science team at Anthropic.
Previously, I co-led the [Superalignment\\
Team](https://openai.com/blog/introducing-superalignment) at OpenAI, where I’ve been involved in the development of [InstructGPT](https://openai.com/research/instruction-following), [ChatGPT](https://openai.com/blog/chatgpt), and the alignment of [GPT-4](https://openai.com/research/gpt-4).
I developed OpenAI’s [approach to alignment research](https://openai.com/blog/our-approach-to-alignment-research/) and co-authored the Superalignment Team’s research roadmap.
Prior to OpenAI, I was an alignment researcher at DeepMind where I prototyped [reinforcement learning from human feedback](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback).
I hold a PhD in reinforcement learning theory from the Australian National University.
In [2023](https://time.com/collection/time100-ai/6310616/jan-leike/) and [2024](https://time.com/7012867/jan-leike/) TIME magazine listed me as one of the 100 most influential people in AI.
## My Research
My research aims to solve
[the hard problem of alignment](https://aligned.substack.com/p/what-is-alignment):
_How can we train AI systems to follow human intent on tasks that are difficult for humans to evaluate directly?_
My team at Anthropic is researching how to align an [automated alignment researcher](https://aligned.substack.com/p/alignment-mvp), working on [scalable oversight](https://aligned.substack.com/p/ai-assisted-human-feedback), [weak-to-strong generalization](https://openai.com/index/weak-to-strong-generalization/), and robustness to jailbreaks.
Read more:
- [The podcast interview with 80,000 hours](https://80000hours.org/podcast/episodes/jan-leike-superalignment/) ( [video](https://www.youtube.com/watch?v=ZP_N4q5U3eE)) is currently the best introduction into my thinking in podcast form, especially if you’re coming from machine learning.
- [The podcast interview with Daniel Filan](https://axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html) goes into more technical questions, and should be interesting for those already somewhat familiar with alignment research.
- [My blog about alignment research](https://aligned.substack.com/).
## Selected Publications
- **[LLM Critics Help Catch LLM Bugs](https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/)**
Nat McAleese, Rai Michael Pokorny, Juan Felipe Cerón Uribe, Evgenia Nitishinskaya, Maja Trebacz, Jan Leike. 2024.
- **[Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision](https://openai.com/index/weak-to-strong-generalization/)**
Collin Burns, Pavel Izmailov, Jan He
... (truncated, 4 KB total)Resource ID:
2a84eb0982d4de6a | Stable ID: NWNjNGViNm