Back
Credibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: 80,000 Hours
Data Status
Not fetched
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Anthropic Core Views | Safety Agenda | 62.0 |
| Mechanistic Interpretability | Approach | 59.0 |
Cached Content Preview
HTTP 200Fetched Feb 23, 202698 KB
## On this page:
- [Introduction](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#top)
- [1 Highlights](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#highlights)
- [2 Articles, books, and other media discussed in the show](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#articles-books-and-other-media-discussed-in-the-show)
- [3 Transcript](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#transcript)
- [3.1 Rob's intro \[00:00:00\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#robs-intro-000000)
- [3.2 The interview begins \[00:02:19\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#the-interview-begins-000219)
- [3.3 Interpretability \[00:05:54\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#interpretability-000554)
- [3.4 Features and circuits \[00:15:11\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#features-and-circuits-001511)
- [3.5 How neural networks think \[00:24:38\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#how-neural-networks-think-002438)
- [3.6 Multimodal neurons \[00:33:30\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#multimodal-neurons-003330)
- [3.7 Safety implications \[00:41:01\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#safety-implications-004101)
- [3.8 Can this approach scale? \[00:53:41\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#can-this-approach-scale-005341)
- [3.9 Disagreement within the field \[01:06:36\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#disagreement-within-the-field-010636)
- [3.10 The importance of visualisation \[01:14:08\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#the-importance-of-visualisation-011408)
- [3.11 Digital suffering \[01:20:49\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#digital-suffering-012049)
- [3.12 Superhuman systems \[01:25:06\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#superhuman-systems-012506)
- [3.13 Language models \[01:32:38\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#language-models-013238)
- [3.14 Sceptical arguments that trouble Chris \[01:38:44\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#sceptical-arguments-that-trouble-chris-013844)
- [3.15 How wonderful it would be if this could succeed \[01:42:57\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#how-wonderful-it-would-be-if-this-could-succeed-014257)
- [3.16 Ways that interpretability research could help us avoid disaster \[01:45:50\]](https://80000hours.
... (truncated, 98 KB total)Resource ID:
5c66c0b83538d580 | Stable ID: ZWE3ODM1ND