AI models can be dangerous before public deployment
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: METR
Published by METR (Model Evaluation and Threat Research), a leading AI safety evaluation organization, this piece is relevant to policy discussions about where in the AI development lifecycle safety obligations should apply.
Metadata
Summary
METR argues that existing AI safety frameworks are too narrowly focused on pre-deployment testing, neglecting the risks posed by internal model usage during development. The piece contends that internal access, staging environments, and pre-release workflows can themselves be vectors for harm. This challenges the assumption that public deployment is the primary risk threshold requiring oversight.
Key Points
- •Safety evaluations focused only on deployment gates miss risks arising from internal model access during development and testing phases.
- •Internal use of powerful AI models by developers and testers can expose sensitive systems and create harm pathways before any public release.
- •Current industry and regulatory frameworks implicitly treat public deployment as the key risk threshold, which METR argues is insufficient.
- •Pre-deployment internal workflows may lack the same scrutiny, access controls, and monitoring applied to publicly released systems.
- •Calls for broader safety frameworks that account for the full lifecycle of model development, not just the deployment moment.
Review
Cached Content Preview
AI models can be dangerous before public deployment - METR
Research
Notes
Updates
About
Donate
Careers
Search
-->
Research
Notes
Updates
About
Donate
Careers
Menu
AI models can be dangerous before public deployment
DATE
January 17, 2025
SHARE
Copy Link
Citation
BibTeX Citation
×
@misc { ai-models-can-be-dangerous-before-public-deployment ,
title = {AI models can be dangerous before public deployment} ,
author = {METR} ,
howpublished = {\url{https://metr.org/blog/2025-01-17-ai-models-dangerous-before-public-deployment/}} ,
year = {2025} ,
month = {01} ,
}
Copy
Many frontier AI safety policies from scaling labs (e.g. OpenAI’s Preparedness Framework, Google DeepMind’s Frontier Safety Framework, etc.), as well as past work by third party evaluators including UK AISI , Apollo Research , and METR , focus on pre-deployment testing – ensuring that the AI model is safe and that the lab has sufficient security before the lab deploys the model to the public.
Such pre-deployment safety evaluations are standard for a wide variety of products across many industries, where the primary risk of the product is to the consumer (see, for example, the crash testing conducted on cars, choking hazard testing for children’s toys, or the various clinical trials for medical devices). A pre-deployment testing–centric framework makes sense for AI development if AI is analogous to such products, and the majority of AI risks come from malicious end-users or mass adoption. 1
But unlike most products, possessing or internally using a powerful AI can create externalities that pose large risks to the public, including:
Model theft and misuse by motivated actors. In the wrong hands, powerful models can empower people to do dangerous things, and absent strong security it’s tempting for malicious actors to steal model weights or algorithmic secrets (to build their own models) and use them to do harm. Pre-deployment testing does little to address harms from model theft. 2
Catastrophic misuse resulting from internal use. Employees at labs may misuse the AI model for ideological or practical reasons, and society as a whole probably does not want a few individuals to decide how to use incredibly powerful AI systems in secret. Pre-deployment testing, if it occurs after internal usage, does nothing to prevent internal misuse.
Powerful AI pursuing unintended and undesirable goals. AI agents may autonomously pursue misaligned or unintended goals without direct h
... (truncated, 11 KB total)199324674d21062d | Stable ID: sid_2eBuTJ7qBS