Chris Olah

web

80,000 Hours·80000hours.org/podcast/episodes/chris-olah-interpretabili...

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: 80,000 Hours

Data Status

Not fetched

Cited by 2 pages

Page	Type	Quality
Anthropic Core Views	Safety Agenda	62.0
Mechanistic Interpretability	Approach	59.0

Cached Content Preview

HTTP 200Fetched Feb 23, 202698 KB

## On this page:

- [Introduction](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#top)
- [1 Highlights](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#highlights)
- [2 Articles, books, and other media discussed in the show](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#articles-books-and-other-media-discussed-in-the-show)
- [3 Transcript](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#transcript)
  - [3.1 Rob's intro \[00:00:00\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#robs-intro-000000)
  - [3.2 The interview begins \[00:02:19\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#the-interview-begins-000219)
  - [3.3 Interpretability \[00:05:54\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#interpretability-000554)
  - [3.4 Features and circuits \[00:15:11\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#features-and-circuits-001511)
  - [3.5 How neural networks think \[00:24:38\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#how-neural-networks-think-002438)
  - [3.6 Multimodal neurons \[00:33:30\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#multimodal-neurons-003330)
  - [3.7 Safety implications \[00:41:01\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#safety-implications-004101)
  - [3.8 Can this approach scale? \[00:53:41\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#can-this-approach-scale-005341)
  - [3.9 Disagreement within the field \[01:06:36\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#disagreement-within-the-field-010636)
  - [3.10 The importance of visualisation \[01:14:08\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#the-importance-of-visualisation-011408)
  - [3.11 Digital suffering \[01:20:49\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#digital-suffering-012049)
  - [3.12 Superhuman systems \[01:25:06\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#superhuman-systems-012506)
  - [3.13 Language models \[01:32:38\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#language-models-013238)
  - [3.14 Sceptical arguments that trouble Chris \[01:38:44\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#sceptical-arguments-that-trouble-chris-013844)
  - [3.15 How wonderful it would be if this could succeed \[01:42:57\]](https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/#how-wonderful-it-would-be-if-this-could-succeed-014257)
  - [3.16 Ways that interpretability research could help us avoid disaster \[01:45:50\]](https://80000hours.

... (truncated, 98 KB total)

Resource ID: 5c66c0b83538d580 | Stable ID: ZWE3ODM1ND