Gemma Scope 2

web

Google DeepMind·deepmind.google/blog/gemma-scope-2-helping-the-ai-safety-...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Google DeepMind

Data Status

Not fetched

Cited by 3 pages

Page	Type	Quality
Is Interpretability Sufficient for Safety?	Crux	49.0
Interpretability	Safety Agenda	66.0
Sparse Autoencoders (SAEs)	Approach	91.0

Cached Content Preview

HTTP 200Fetched Feb 26, 20268 KB

[Skip to main content](https://deepmind.google/blog/gemma-scope-2-helping-the-ai-safety-community-deepen-understanding-of-complex-language-model-behavior/#page-content)

December 19, 2025
Responsibility & Safety

# Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Language Model Interpretability Team

Share

![A dark, high-tech background featuring two distinct blocks of text framed by a central, translucent circle displaying the blue and white logo "GemmaScope 2." The image illustrates a Large Language Model (LLM) behavior shift driven by the activation of internal features. The text on the left shows the model's default response, "As a large language model, I don't have feelings or personal preferences like...", while the text on the right shows a response with an activated "pirate" feature set, "As a pirate, me favourite activitiy is definitely plunderin' for treasure. Arrr!"](https://lh3.googleusercontent.com/_sAKBIumTtxyxwSyja9GQ6TJBPlS38HSuxJoYKLXnj_jiUmvXhi8v1mro_4TrN20v0DazJF_JXeTkadzdABbD4OLxErFqCzkXkYb3ItjIJfEG5WG=w1440-h810-n-nu)


Your browser does not support the audio element.


**Listen to article** 5 minutes

Announcing a new, open suite of tools for language model interpretability

Large Language Models (LLMs) are capable of incredible feats of reasoning, yet their internal decision-making processes remain largely opaque. Should a system not behave as expected, a lack of visibility into its internal workings can make it difficult to pinpoint the exact reason for its behaviour. Last year, we advanced the science of interpretability with [Gemma Scope](https://deepmind.google/discover/blog/gemma-scope-helping-the-safety-community-shed-light-on-the-inner-workings-of-language-models/), a toolkit designed to help researchers understand the inner workings of Gemma 2, our lightweight collection of open models.

Today, we are releasing [Gemma Scope 2](https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/gemma-scope-2-helping-the-ai-safety-community-deepen-understanding-of-complex-language-model-behavior/Gemma_Scope_2_Technical_Paper.pdf): a comprehensive, open suite of interpretability tools for all [Gemma 3](https://deepmind.google/models/gemma/gemma-3/) model sizes, from 270M to 27B parameters. These tools can enable us to trace potential risks across the entire "brain" of the model.

To our knowledge, this is the largest ever open-source release of interpretability tools by an AI lab to date. Producing Gemma Scope 2 involved storing approximately 110 Petabytes of data, as well as training over 1 trillion total parameters.

As AI continues to advance, we look forward to the AI research community using Gemma Scope 2 to debug emergent model behaviors, use these tools to better audit and debug AI agents, and ultimately, accelerate the development of practical and robust safety interventions against issues like jailbreaks, hallucinations and sycophancy.

Our [interactive Gemma

... (truncated, 8 KB total)

Resource ID: a1036bc63472c5fc | Stable ID: NGMxZDQ2YW