Skip to content
Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Google DeepMind

An institutional safety framework from Google DeepMind, comparable to Anthropic's Responsible Scaling Policy and OpenAI's Preparedness Framework; useful for understanding industry approaches to capability thresholds and deployment-gated safety commitments.

Metadata

Importance: 72/100blog postprimary source

Summary

DeepMind's Frontier Safety Framework (FSF) establishes a structured approach to identifying and mitigating catastrophic risks from highly capable AI models before and during deployment. It introduces 'Critical Capability Levels' (CCLs) as thresholds that trigger enhanced safety evaluations, and outlines mitigation measures to prevent severe harms such as bioweapons development or AI autonomously undermining human oversight. The framework represents a concrete institutional commitment to capability-gated safety protocols.

Key Points

  • Defines 'Critical Capability Levels' (CCLs) as specific thresholds in areas like biosecurity, cybersecurity, and autonomous AI that trigger mandatory safety reviews.
  • Establishes a two-part mitigation requirement: deployment controls to prevent misuse and model-level safeguards to prevent autonomous harmful actions.
  • Focuses on two primary risk domains: uplift to catastrophic weapons (CBRN) and AI systems capable of undermining human oversight mechanisms.
  • Commits to regular evaluation of frontier models against CCLs before training completion and before deployment.
  • Positions FSF as a living document intended to evolve alongside capability advances and emerging safety research.

Cited by 6 pages

Cached Content Preview

HTTP 200Fetched Apr 9, 20267 KB
Introducing the Frontier Safety Framework — Google DeepMind Skip to main content May 17, 2024 Responsibility & Safety Introducing the Frontier Safety Framework

 Anca Dragan, Helen King and Allan Dafoe

 Share Our approach to analyzing and mitigating future risks posed by advanced AI models

 Google DeepMind has consistently pushed the boundaries of AI, developing models that have transformed our understanding of what's possible. We believe that AI technology on the horizon will provide society with invaluable tools to help tackle critical global challenges, such as climate change, drug discovery, and economic productivity. At the same time, we recognize that as we continue to advance the frontier of AI capabilities, these breakthroughs may eventually come with new risks beyond those posed by present-day models.

 Today, we are introducing our Frontier Safety Framework — a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them. Our Framework focuses on severe risks resulting from powerful capabilities at the model level, such as exceptional agency or sophisticated cyber capabilities. It is designed to complement our alignment research, which trains models to act in accordance with human values and societal goals, and Google’s existing suite of AI responsibility and safety practices .

 The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and evaluations, and collaborate with industry, academia, and government. Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.

 The framework

 The first version of the Framework announced today builds on our research on evaluating critical capabilities in frontier models, and follows the emerging approach of Responsible Capability Scaling. The Framework has three key components:

 Identifying capabilities a model may have with potential for severe harm. To do this, we research the paths through which a model could cause severe harm in high-risk domains, and then determine the minimal level of capabilities a model must have to play a role in causing such harm. We call these “Critical Capability Levels” (CCLs), and they guide our evaluation and mitigation approach.
 Evaluating our frontier models periodically to detect when they reach these Critical Capability Levels. To do this, we will develop suites of model evaluations, called “early warning evaluations,” that will alert us when a model is approaching a CCL, and run them frequently enough that we have notice before that threshold is reached.
 Applying a mitigation plan when a model passes our early warning evaluations. This should take into account the overall balance of benefits and risks, and 

... (truncated, 7 KB total)
Resource ID: d8c3d29798412b9f | Stable ID: sid_4dfsUFHiSQ