Longterm Wiki
Back

deprioritizing SAE research

blog

Data Status

Not fetched

Cited by 3 pages

PageTypeQuality
InterpretabilitySafety Agenda66.0
Mechanistic InterpretabilityApproach59.0
Sparse Autoencoders (SAEs)Approach91.0

Cached Content Preview

HTTP 200Fetched Feb 26, 202636 KB
[Sitemap](https://deepmindsafetyresearch.medium.com/sitemap/sitemap.xml)

[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fdeepmindsafetyresearch.medium.com%2Fnegative-results-for-sparse-autoencoders-on-downstream-tasks-and-deprioritising-sae-research-6cadcfc125b9&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)

Get app

[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)

[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fdeepmindsafetyresearch.medium.com%2Fnegative-results-for-sparse-autoencoders-on-downstream-tasks-and-deprioritising-sae-research-6cadcfc125b9&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

![](https://miro.medium.com/v2/resize:fill:32:32/1*dmbNkD5D-u45r44go_cf0g.png)

# Negative Results for Sparse Autoencoders On Downstream Tasks and Deprioritising SAE Research (Mechanistic Interpretability Team Progress Update)

[![DeepMind Safety Research](https://miro.medium.com/v2/resize:fill:32:32/2*y3lgushvo5U-VptVQbSX9Q.png)](https://deepmindsafetyresearch.medium.com/?source=post_page---byline--6cadcfc125b9---------------------------------------)

[DeepMind Safety Research](https://deepmindsafetyresearch.medium.com/?source=post_page---byline--6cadcfc125b9---------------------------------------)

Follow

9 min read

·

Mar 26, 2025

34

[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D6cadcfc125b9&operation=register&redirect=https%3A%2F%2Fdeepmindsafetyresearch.medium.com%2Fnegative-results-for-sparse-autoencoders-on-downstream-tasks-and-deprioritising-sae-research-6cadcfc125b9&source=---header_actions--6cadcfc125b9---------------------post_audio_button------------------)

Share

_Lewis Smith\*, Sen Rajamanoharan\*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda_

_\\* = equal contribution_

The following piece is a list of snippets about research from the GDM mechanistic interpretability team, which we didn’t consider a good fit for turning into a paper, but which we thought the community might benefit from seeing in this less formal form. These are largely things that we found in the process of a project investigating whether sparse autoencoders were useful for downstream tasks, notably out-of-distribution probing.

This blo

... (truncated, 36 KB total)
Resource ID: 244c1b93ef0a083c | Stable ID: YjRiOWIyNj