Skip to content
Longterm Wiki
Back

Mechanistic Interpretability Workshop at NeurIPS 2025

web
mechinterpworkshop.com·mechinterpworkshop.com/

This is the official workshop page for the NeurIPS 2025 Mechanistic Interpretability Workshop, a key community event for researchers developing tools to understand neural network internals as part of the broader AI safety research agenda.

Metadata

Importance: 55/100homepage

Summary

The Mechanistic Interpretability Workshop at NeurIPS 2025 is a dedicated academic venue for researchers working on understanding the internal computations of neural networks. It brings together work on circuits, features, sparse autoencoders, and related techniques aimed at reverse-engineering how AI models process information. The workshop represents a key gathering point for the interpretability research community.

Key Points

  • Official workshop at NeurIPS 2025 focused on mechanistic interpretability research
  • Covers core MI topics including circuits, features, and sparse autoencoders
  • Serves as a major venue for presenting and discussing advances in neural network interpretability
  • Connects academic and industry researchers working on understanding AI model internals
  • Reflects growing institutional recognition of mechanistic interpretability as a serious research field

Cited by 2 pages

PageTypeQuality
InterpretabilityResearch Area66.0
Mechanistic InterpretabilityResearch Area59.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20262 KB
Mechanistic Interpretability Workshop at ICML 2026 
 Mechanistic Interpretability Workshop 

 ICML 2026

 July 10 or 11, 2026 (TBC)
COEX Convention & Exhibition Center · Seoul, South Korea

 Call for Papers is now live — submissions due May 8th, 2026 (AOE) Get workshop updates 
 
 Subscribe As neural networks grow in influence and capability, understanding the mechanisms behind their decisions remains a fundamental scientific challenge. This gap between performance and understanding limits our ability to predict model behavior, ensure reliability, and detect sophisticated adversarial or deceptive behavior. Many of the deepest scientific mysteries in machine learning may remain out of reach if we cannot look inside the black box.

 Mechanistic interpretability addresses this challenge by developing principled methods to analyze and understand a model’s internals–weights and activations–and to use this understanding to gain greater insight into its behavior, and the computation underlying it.

 The field has grown rapidly, with sizable communities in academia, industry and independent research, dedicated startups, and a rich ecosystem of tools and techniques. Following our workshops at ICML 2024 and NeurIPS 2025 , this edition at ICML 2026 aims to bring together diverse perspectives from the community to discuss recent advances, build common understanding and chart future directions.

 Call for Papers is now live! Submit on OpenReview . 

 Paper submission deadline — May 8th, 2026 (AOE). 

 
 The first Mechanistic Interpretability Workshop (ICML 2024).

 Organizing Committee

 Neel Nanda 

 Google DeepMind

 Andrew Lee 

 Harvard

 Andy Arditi 

 Northeastern University

 Stefan Heimersheim 

 Adecco

 Anna Soligo 

 Imperial College London

 J Rosser 

 University of Oxford

 Iván Arcuschin 

 Poseidon Research

 Taero Kim 

 Yonsei University

 Questions? Email mechinterpworkshop@gmail.com 

 What are those beautiful rainbow flower things?

 These are visualizations of "curve detector" neurons from early mechanistic interpretability research. Learn more in the Curve Detectors article on Distill.
Resource ID: e78a965cde8d82bd | Stable ID: sid_I96bzlhaa1