Back
deprioritizing SAE research
blogdeepmindsafetyresearch.medium.com·deepmindsafetyresearch.medium.com/negative-results-for-sp...
Data Status
Not fetched
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Interpretability | Safety Agenda | 66.0 |
| Mechanistic Interpretability | Approach | 59.0 |
| Sparse Autoencoders (SAEs) | Approach | 91.0 |
Cached Content Preview
HTTP 200Fetched Feb 26, 202636 KB
[Sitemap](https://deepmindsafetyresearch.medium.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fdeepmindsafetyresearch.medium.com%2Fnegative-results-for-sparse-autoencoders-on-downstream-tasks-and-deprioritising-sae-research-6cadcfc125b9&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
Get app
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fdeepmindsafetyresearch.medium.com%2Fnegative-results-for-sparse-autoencoders-on-downstream-tasks-and-deprioritising-sae-research-6cadcfc125b9&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

# Negative Results for Sparse Autoencoders On Downstream Tasks and Deprioritising SAE Research (Mechanistic Interpretability Team Progress Update)
[](https://deepmindsafetyresearch.medium.com/?source=post_page---byline--6cadcfc125b9---------------------------------------)
[DeepMind Safety Research](https://deepmindsafetyresearch.medium.com/?source=post_page---byline--6cadcfc125b9---------------------------------------)
Follow
9 min read
·
Mar 26, 2025
34
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D6cadcfc125b9&operation=register&redirect=https%3A%2F%2Fdeepmindsafetyresearch.medium.com%2Fnegative-results-for-sparse-autoencoders-on-downstream-tasks-and-deprioritising-sae-research-6cadcfc125b9&source=---header_actions--6cadcfc125b9---------------------post_audio_button------------------)
Share
_Lewis Smith\*, Sen Rajamanoharan\*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda_
_\\* = equal contribution_
The following piece is a list of snippets about research from the GDM mechanistic interpretability team, which we didn’t consider a good fit for turning into a paper, but which we thought the community might benefit from seeing in this less formal form. These are largely things that we found in the process of a project investigating whether sparse autoencoders were useful for downstream tasks, notably out-of-distribution probing.
This blo
... (truncated, 36 KB total)Resource ID:
244c1b93ef0a083c | Stable ID: YjRiOWIyNj