Skip to content
Longterm Wiki
Back

Google DeepMind: Frontier Safety Framework Version 3.0

web

This is Google DeepMind's official safety framework document (version 3.0), analogous to Anthropic's RSP and OpenAI's Preparedness Framework; essential reference for understanding industry approaches to frontier AI risk management.

Metadata

Importance: 78/100organizational reportprimary source

Summary

Google DeepMind's Frontier Safety Framework (v3.0) defines protocols for identifying Critical Capability Levels (CCLs) at which frontier AI models may pose severe risks, and outlines mitigation approaches across three risk categories: misuse, ML R&D acceleration, and misalignment. The framework specifies risk assessment processes, response plans, and criteria for evaluating whether mitigations are sufficient before deployment.

Key Points

  • Defines 'Critical Capability Levels' (CCLs) as capability thresholds requiring specific mitigations across misuse, ML R&D, and misalignment risk categories.
  • Outlines a structured risk assessment process including evaluation of cross-cutting capabilities like agency, tool use, reasoning, and scientific understanding.
  • Acknowledges that some mitigations only provide societal value if adopted industry-wide, highlighting the need for coordinated action across AI labs.
  • Includes an exploratory approach to misalignment risk, reflecting the nascent state of research on detecting and mitigating misaligned AI systems.
  • Commits to periodic review and evolution of the framework as understanding of frontier model risks and capabilities improves.

Cited by 2 pages

PageTypeQuality
Corporate AI Safety ResponsesApproach68.0
AI Safety CasesApproach91.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202640 KB
Frontier Safety Framework
Frontier Safety
Framework
Version 3.0
Published: September 22, 2025

-- 1 of 16 --

Frontier Safety Framework
Overview
The Frontier Safety Framework is a set of protocols that aims to address severe risks that may arise from
the high-impact capabilities of frontier AI models. It complements Google’s suite of AI responsibility and
safety practices, and enables AI innovation and deployment consistent with our AI Principles.
The Framework is informed by the broader conversation on Frontier AI Safety and Security Frameworks.1
The core components of such Frameworks are to:
● Identify capability levels at which frontier AI models, without additional mitigations, could pose
severe risk.
● Implement protocols to detect the attainment of such capability levels throughout the model
lifecycle.
● Prepare and articulate proactive mitigation plans to ensure severe risks are adequately mitigated
when such capability levels are attained.
● Where required or appropriate, involve external parties to help inform and guide the approach.
In the Framework, we specify protocols for the detection of capability levels at which frontier AI models
may pose severe risks (which we call “Critical Capability Levels (CCLs)”), and articulate mitigation
approaches to address such risks. The Framework addresses misuse risk,2 risks from machine learning
research and development (ML R&D), and misalignment risk.3 For each type of risk, we define a set of
CCLs and a mitigation approach for them. Risk assessment will necessarily involve evaluating
cross-cutting capabilities such as agency, tool use, reasoning, and scientific understanding.
The safety and security of frontier AI models is a global public good. The protocols here represent our
current understanding and approach of how severe frontier AI risks may be anticipated and addressed.
Importantly, there are certain mitigations whose social value is significantly reduced if not broadly
applied to frontier AI models reaching critical capabilities. These mitigations are most effective when
adopted by industry as a whole: our adoption of them would result in effective risk mitigation for society
only if all relevant organisations provide similar levels of protection.
The Framework is based on early and evolving research. We may change our approach over time as we
gain experience and insights on the projected capabilities of future frontier models. We will review the
Framework periodically and we expect it to evolve substantially as our understanding of the risks and
benefits of frontier models improves.
3 Misalignment can pose a number of risks. In the context of the Framework, we address specific scenarios where
general-purpose AI agents are potentially misaligned and can become difficult to control, thereby posing a risk of
severe harm.
2 As in, in the context of the Framework, risks of threat actors using critical capabilities of deployed or exfiltrated
models to cause harm.
1 See https://www.gov.uk/gover

... (truncated, 40 KB total)
Resource ID: 3c56c8c2a799e4ef | Stable ID: sid_XOO1bHptdb