Mechanistic Interpretability Workshop at NeurIPS 2025

web

mechinterpworkshop.com·mechinterpworkshop.com/

Data Status

Not fetched

Cited by 2 pages

Page	Type	Quality
Interpretability	Safety Agenda	66.0
Mechanistic Interpretability	Approach	59.0

Cached Content Preview

HTTP 200Fetched Feb 26, 20265 KB

![](https://mechinterpworkshop.com/img/curve3_final_trans.png)![](https://mechinterpworkshop.com/img/curve2_final_trans.png)

# Mechanistic Interpretability Workshop

NeurIPS 2025

Sunday, December 7, 2025

San Diego Convention Centre · Room 30A-E

![](https://mechinterpworkshop.com/img/curve0_final_trans.png)![](https://mechinterpworkshop.com/img/curve1_final_trans.png)![](https://mechinterpworkshop.com/img/curve2_final_trans.png)![](https://mechinterpworkshop.com/img/curve3_final_trans.png)

[Attended? Give feedback on the workshop!](https://docs.google.com/forms/d/e/1FAIpQLSe13gWSNKtrC3uGsrhgg5qRLHxSLCBmP3p7ZGoqxDKaV4_Cmg/viewform)

Get workshop updatesSubscribe

As neural networks grow in influence and capability, understanding the mechanisms behind their decisions remains a fundamental scientific challenge. This gap between performance and understanding limits our ability to predict model behavior, ensure reliability, and detect sophisticated adversarial or deceptive behavior. Many of the deepest scientific mysteries in machine learning may remain out of reach if we cannot look inside the black box.

[Mechanistic interpretability](https://arxiv.org/abs/2501.16496) addresses this challenge by developing principled methods to analyze and understand a model’s internals–weights and activations–and to use this understanding to gain greater insight into its behavior, and the computation underlying it.

The field has grown rapidly, with sizable communities in academia, industry and independent research, 140+ papers submitted to our ICML 2024 workshop, dedicated startups, and a rich ecosystem of tools and techniques. This workshop aims to bring together diverse perspectives from the community to discuss recent advances, build common understanding and chart future directions.

See our [Call for Papers](https://mechinterpworkshop.com/cfp/) for submission details and topics of interest.

## Keynote Speakers

![Chris Olah](https://mechinterpworkshop.com/img/chrisolah.jpeg)

### [Chris Olah](https://colah.github.io/about.html)

Interpretability Lead and Co-founder, Anthropic

![Been Kim](https://mechinterpworkshop.com/img/beenkim.jpeg)

### [Been Kim](https://beenkim.github.io/)

Senior Staff Research Scientist, Google DeepMind

![Sarah Schwettmann](https://mechinterpworkshop.com/img/sarahschwettmann.jpeg)

### [Sarah Schwettmann](https://cogconfluence.com/)

Co-founder, Transluce

![ICML 2024 Workshop](https://mechinterpworkshop.com/img/conference-pic.jpg)![ICML 2024 Social](https://mechinterpworkshop.com/img/rooftop-pic.jpg)

The first Mechanistic Interpretability Workshop (ICML 2024).

## Organizing Committee

![Neel Nanda](https://mechinterpworkshop.com/img/neelnanda.jpeg)

### [Neel Nanda](https://www.neelnanda.io/about)

Senior Research Scientist, Google DeepMind

![Andrew Lee](https://mechinterpworkshop.com/img/andrewlee.jpeg)

### [Andrew Lee](https://ajyl.github.io/)

Post-doc, Harvard

![Andy Arditi](https://mechinterpworkshop.com/img/andyardit

... (truncated, 5 KB total)

Resource ID: e78a965cde8d82bd | Stable ID: OGJhNDAzMj