CO
Chris Olah
Also known as: Christopher Olah
Pioneer of neural network interpretability and visualization; co-founder of Anthropic; creator of Distill.pub and the Circuits thread at Transformer Circuits
Current Role
Co-founder, Interpretability
Organization
Anthropic
Expert Positions2 topics
| Topic | View | Estimate | Confidence | Date |
|---|---|---|---|---|
| How Hard Is Alignment? | Tractable via interpretability | Solvable with sufficient transparency tools | medium | 2021 |
| Will Advanced AI Be Deceptive? | Detectable with interpretability | Interpretability provides a "mulligan" to catch deception | medium | Dec 2021 |
Career History3
Education
Attended University of Toronto (did not complete degree); Thiel Fellow
Publications & Resources4
Zoom In: An Introduction to Circuits
→2020PaperInterpretability
Concrete Problems in AI Safety
→2016PaperTechnical Safety
Understanding LSTM Networks
→2015Blog PostInterpretability
Neural Networks, Types, and Functional Programming
→2015Blog PostInterpretability
Links
Organization Roles1
Facts8
People
Employed ByAnthropic
Role / TitleCo-founder, Interpretability
Biographical
EducationAttended University of Toronto (did not complete degree); Thiel Fellow
Notable ForPioneer of neural network interpretability and visualization; co-founder of Anthropic; creator of Distill.pub and the Circuits thread at Transformer Circuits
Social Media@ch402
GitHubhttps://github.com/colah
Google Scholarhttps://scholar.google.com/citations?user=vKAKE1gAAAAJ
General
Websitehttps://colah.github.io