TransformerLens: A Library for Mechanistic Interpretability of Language Models

web

GitHub·github.com/neelnanda-io/TransformerLens

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

TransformerLens is a widely-used open-source library for mechanistic interpretability research on GPT-style language models, enabling researchers to inspect and manipulate internal activations to reverse-engineer learned algorithms — a core technique in AI safety interpretability work.

Metadata

Importance: 82/100tool pagetool

Summary

TransformerLens is a Python library created by Neel Nanda that enables mechanistic interpretability research on GPT-2 style language models. It supports 50+ open-source models and allows researchers to cache, edit, and analyze internal activations. It has been used in numerous influential interpretability papers including work on grokking, circuit discovery, and neuron representations.

Key Points

•Supports loading and inspecting internals of 50+ open-source language models including GPT-2
•Allows caching any internal activation and adding hooks to edit, remove, or replace activations at runtime
•Used in foundational mechanistic interpretability research (grokking, induction heads, circuit discovery, etc.)
•Maintained by Bryce Meyer, originally created by Neel Nanda; actively developed with 3.4k+ GitHub stars
•Provides tutorials and demos to lower the barrier to entry for mechanistic interpretability research

Cited by 2 pages

Page	Type	Quality
Probing / Linear Probes	Approach	55.0
Sparse Autoencoders (SAEs)	Approach	91.0

Cached Content Preview

HTTP 200Fetched Apr 28, 202610 KB

TransformerLensOrg
 
 / 
 
 TransformerLens 
 

 Public 
 

 

 
 
 
 

 
 
 
 Notifications
 You must be signed in to change notification settings 

 

 
 
 
 Fork
 557 
 
 

 
 
 
 
 
 Star
 3.4k 
 
 

 

 
 

 
 

 

 
 

 
 
 

 
 
 

 
 
 
   main Branches Tags Go to file Code Open more actions menu Folders and files

 Name Name Last commit message Last commit date Latest commit

   History

 1,198 Commits 1,198 Commits .devcontainer .devcontainer     .github .github     .vscode .vscode     assets assets     debugging debugging     demos demos     docs docs     easy_transformer easy_transformer     tests tests     transformer_lens transformer_lens     .gitattributes .gitattributes     .gitconfig .gitconfig     .gitignore .gitignore     LICENSE LICENSE     Main_Demo.ipynb Main_Demo.ipynb     README.md README.md     makefile makefile     pyproject.toml pyproject.toml     uv.lock uv.lock     View all files Repository files navigation

 TransformerLens

 

 
 
 
 

 A Library for Mechanistic Interpretability of Generative Language Models. Maintained by Bryce Meyer and created by Neel Nanda 

 

 This is a library for doing mechanistic
interpretability of GPT-2 Style language models. The
goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms
the model learned during training from its weights.

 TransformerLens lets you load in 50+ different open source language models, and exposes the internal
activations of the model to you. You can cache any internal activation in the model, and add in
functions to edit, remove or replace these activations as the model runs.

 Quick Start

 
 Install

 
 pip install transformer_lens 
 Python 3.8 or 3.9

 
 pip install ' transformer_lens~=2.0 ' 
 Use

 
 from transformer_lens . model_bridge import TransformerBridge 

 # Load a model (eg GPT-2 Small) 
 bridge = TransformerBridge . boot_transformers ( "gpt2" , device = "cpu" )

 # Run the model and get logits and activations 
 logits , activations = bridge . run_with_cache ( "Hello World" ) 
 TransformerBridge is the recommended 3.0 path and supports 50+ architectures. The legacy HookedTransformer.from_pretrained API is still available through a compatibility layer but is deprecated - see the Migrating to TransformerLens 3 guide for conversion recipes.

 Key Tutorials

 
 
 Introduction to the Library and Mech
Interp 

 Demo of Main TransformerLens Features 

 
 Gallery

 
 Research done involving TransformerLens:

 
 Progress Measures for Grokking via Mechanistic
Interpretability (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

 Finding Neurons in a Haystack: Case Studies with Sparse
Probing by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
Harvey, Dmitrii Troitskii, Dimitris Bertsimas

 Towards Automated Circuit Discovery for Mechanistic
Interpretability by Arthur Conmy, Augustine N. Mavor-Parker,
Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso

 Actually, Othel

... (truncated, 10 KB total)

Resource ID: 9a1c10a5ca133223 | Stable ID: sid_RApOHPZl7A