Skip to content
Longterm Wiki
Back

Zoom In: An Introduction to Circuits

web

A seminal Distill.pub paper by Olah et al. (OpenAI, 2020) that launched the 'circuits' research thread, widely considered foundational reading for mechanistic interpretability research in AI safety.

Metadata

Importance: 90/100blog postprimary source

Summary

This foundational Distill article introduces the 'circuits' framework for neural network interpretability, arguing that by studying connections between neurons we can reverse-engineer meaningful algorithms in neural network weights. It proposes three speculative claims: that features are the fundamental units of neural networks, that features are connected by circuits, and that similar features and circuits recur across different models and tasks.

Key Points

  • Introduces the 'circuits' approach to mechanistic interpretability, treating neural networks as reverse-engineerable computational systems with meaningful internal structure.
  • Proposes that neural networks contain interpretable 'features' (e.g., curve detectors, high-low frequency detectors) as fundamental units of computation.
  • Argues that features are connected by 'circuits'—subgraphs of the network that implement identifiable algorithms.
  • Claims universality: similar features and circuits appear across different architectures and tasks, suggesting convergent computational solutions.
  • Uses the analogy of scientific 'zooming in' (microscopes→cells, crystallography→DNA) to frame mechanistic interpretability as a paradigm shift in understanding AI.

Cited by 2 pages

PageTypeQuality
Chris OlahPerson27.0
Mechanistic InterpretabilityResearch Area59.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202661 KB
Zoom In: An Introduction to Circuits 
 
 
 
 
 
 Zoom In: An Introduction to Circuits 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 

 Distill
 
 
 
 
 
 { "title": "Zoom In: An Introduction to Circuits", "description": "By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.",
 "authors": [
 { "author": "Chris Olah", "authorURL": "https://colah.github.io", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Nick Cammarata", "authorURL": "http://nickcammarata.com", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Ludwig Schubert", "authorURL": "https://schubert.io/", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Gabriel Goh", "authorURL": "http://gabgoh.github.io", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Michael Petrov", "authorURL": "https://twitter.com/mpetrov", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" },
 { "author": "Shan Carter", "authorURL": "http://shancarter.com", "affiliation": "OpenAI", "affiliationURL": "https://openai.com" }
 ] } 
 
 
 Zoom In: An Introduction to Circuits

 By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.

 -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Authors

 Affiliations

 
 
 
 Chris Olah 
 

 
 OpenAI 
 

 
 
 
 Nick Cammarata 
 

 
 OpenAI 
 

 
 
 
 Ludwig Schubert 
 

 
 OpenAI 
 

 
 
 
 Gabriel Goh 
 

 
 OpenAI 
 

 
 
 
 Michael Petrov 
 

 
 OpenAI 
 

 
 
 
 Shan Carter 
 

 
 OpenAI 
 

 
 
 
 Published

 
 March 10, 2020

 
 
 
 DOI

 
 10.23915/distill.00024.001 

 
 
 
 
 
 
 
 This article is part of the Circuits thread , an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks.
 -->
 

 Circuits Thread 
 An Overview of Early Vision in InceptionV1 
 

 
 
 

 
 Introduction

-->
 
 Many important transition points in the history of science have been moments when science “zoomed in.”
 At these points, we develop a visualization or tool that allows us to see the world in a new level of detail, and a new field of science develops to study the world through this lens.
 

 
 For example, microscopes let us see cells, leading to cellular biology. Science zoomed in. Several techniques including x-ray crystallography let us see DNA, leading to the molecular revolution. Science zoomed in. Atomic theory. Subatomic particles. Neuroscience. Science zoomed in.
 

 
 These transitions weren’t just a change in precision: they were qualitative changes in what the object

... (truncated, 61 KB total)
Resource ID: 346b1574c0c3ce67 | Stable ID: sid_DlMLRNm3TO