Back
Circuits work
webdistill.pub·distill.pub/2020/circuits/
Data Status
Not fetched
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
| Is Interpretability Sufficient for Safety? | Crux | 49.0 |
Cached Content Preview
HTTP 200Fetched Feb 23, 20268 KB
[Distill](https://distill.pub/)
# Thread: _Circuits_
What can we learn if we invest heavily in reverse engineering a single
neural network?
### Authors
### Affiliations
[Nick Cammarata](http://nickcammarata.com/)
[OpenAI](https://openai.com/)
[Shan Carter](http://shancarter.com/)
[OpenAI](https://openai.com/)
[Gabriel Goh](http://gabgoh.github.io/)
[OpenAI](https://openai.com/)
[Chris Olah](https://colah.github.io/)
[OpenAI](https://openai.com/)
[Michael Petrov](https://twitter.com/mpetrov)
[OpenAI](https://openai.com/)
[Ludwig Schubert](https://schubert.io/)
[OpenAI](https://openai.com/)
[Chelsea Voss](https://csvoss.com/)
[OpenAI](https://openai.com/)
Ben Egan
[Mount Royal University](https://mtroyal.ca/)
[Swee Kiat Lim](https://greentfrapp.github.io/)
[Stanford University](https://stanford.edu/)
### Published
March 10, 2020
### DOI
[10.23915/distill.00024](https://doi.org/10.23915/distill.00024)
In the original narrative of deep learning, each neuron builds
progressively more abstract, meaningful features by composing features in
the preceding layer. In recent years, there’s been some skepticism of this
view, but what happens if you take it really seriously?
InceptionV1 is a classic vision model with around 10,000 unique neurons — a large number, but still on a scale that a group effort could attack.
What if you simply go through the model, neuron by neuron, trying to
understand each one and the connections between them? The circuits
collaboration aims to find out.
## Articles & Comments
The natural unit of publication for investigating circuits seems to be
short papers on individual circuits or small families of features.
Compared to normal machine learning papers, this is a small and unusual
topic for a paper.
To facilitate exploration of this direction, Distill is inviting a
“thread” of short articles on circuits, interspersed with critical
commentary by experts in adjacent fields. The thread will be a living
document, with new articles added over time, organized through an open
slack channel (#circuits in the
[Distill slack](http://slack.distill.pub/)). Content in this
thread should be seen as early stage exploratory research.
Articles and comments are presented below in chronological order:
Does it make sense to treat individual neurons and the connections
between them as a serious object of study? This essay proposes three
claims which, if true, might justify serious inquiry into them: the
existence of meaningful features, the existence of meaningful circuits
between features, and the universality of those features and circuits.
It also discuses historical successes of science “zooming in,” whether
we should be concerned about this research being qualitative, and
approaches to rigorous investigation.
[Read Full Article](https://distill.pub/2020/circuits/zoom-in/)
An overview of all the neurons in the first five layers of
InceptionV1, organized into a taxonomy of “neuron groups.” This
article set
... (truncated, 8 KB total)Resource ID:
ad268b74cee64b6f | Stable ID: M2NkZjFjYm