LLM Summary:Comprehensive analysis of Responsible Scaling Policies showing 20 companies with published frameworks as of Dec 2025, with SaferAI grading major policies 1.9-2.2/5 for specificity. Evidence suggests moderate effectiveness hindered by voluntary nature, competitive pressure among 3+ labs, and ~7-month capability doubling potentially outpacing evaluation science, though third-party verification (METR evaluated 5+ models) and Seoul Summit commitments (16 signatories) represent meaningful coordination progress.
Issues (2):
QualityRated 62 but structure suggests 100 (underrated by 38 points)
Responsible Scaling Policies (RSPs) are self-imposed commitments by AI labs to tie AI development to safety progress. The core idea is simple: before scaling to more capable systems, labs commit to demonstrating that their safety measures are adequate for the risks those systems would pose. If evaluations reveal dangerous capabilities without adequate safeguards, development should pause until safety catches up.
Anthropic introduced the first RSP↗🔗 web★★★★☆AnthropicResponsible Scaling Policygovernancecapabilitiestool-useagentic+1Source ↗Notes in September 2023, establishing “AI Safety Levels” (ASL-1 through ASL-4+) analogous to biosafety levels. OpenAI followed with its Preparedness Framework↗🔗 web★★★★☆OpenAIPreparedness Frameworkbiosecuritydual-use-researchx-riskSource ↗Notes in December 2023, and Google DeepMind published its Frontier Safety Framework↗🔗 web★★★★☆Google DeepMindGoogle DeepMind: Introducing the Frontier Safety FrameworksafetySource ↗Notes in May 2024. By late 2024, twelve major AI companies↗🔗 web★★★★☆METRMETR's analysis of 12 companiesevaluationsdangerous-capabilitiesautonomous-replicationSource ↗Notes had published some form of frontier AI safety policy, and the Seoul Summit↗🏛️ government★★★★☆UK GovernmentSeoul Frontier AI Commitmentsself-regulationindustry-commitmentsresponsible-scalinggovernance+1Source ↗Notes secured voluntary commitments from sixteen companies.
RSPs represent a significant governance innovation because they create a mechanism for safety-capability coupling without requiring external regulation. As of December 2025, 20 companies have published frontier AI safety policies, up from 12 at the May 2024 Seoul Summit. Third-party evaluators like METR have conducted pre-deployment assessments of 5+ major models. However, RSPs face fundamental challenges: they are 100% voluntary with no legal enforcement, labs set their own thresholds (leading to SaferAI grades of only 1.9-2.2 out of 5), competitive pressure among 3+ frontier labs creates incentives to interpret policies permissively, and capability doubling times of approximately 7 months may outpace evaluation science.
The three leading frontier AI labs have published distinct but conceptually similar frameworks. All share the core structure of capability thresholds triggering escalating safeguards, but differ in specificity, governance, and scope.
”Significantly enhances capabilities of non-state actors” beyond publicly available info
Autonomous cyberattacks on hardened targets
”Substantially accelerates” AI R&D timeline
OpenAI High
”Meaningful counterfactual assistance to novice actors” creating known threats
”New risks of scaled cyberattacks”
Self-improvement creating “new challenges for human control”
OpenAI Critical
”Unprecedented new pathways to severe harm”
Novel attack vectors at scale
Recursive self-improvement; 5x speed improvement
DeepMind CCL
”Heightened risk of severe harm” from bio capabilities
”Sophisticated cyber capabilities"
"Exceptional agency” and ML research capabilities
Sources: Anthropic RSP↗🔗 web★★★★☆AnthropicAnthropic's Responsible Scaling PolicyAnthropic introduces a systematic approach to managing AI risks by establishing AI Safety Level (ASL) Standards that dynamically adjust safety measures based on model capabiliti...governancecapabilitiessafetyx-risk+1Source ↗Notes, OpenAI Preparedness Framework v2↗🔗 webOpenAI: Preparedness Framework Version 2Source ↗Notes, DeepMind FSF v3↗🔗 webGoogle DeepMind: Frontier Safety Framework Version 3.0safetySource ↗Notes
RSPs create a framework linking capability levels to safety requirements. The core mechanism involves three interconnected processes: capability evaluation, safeguard assessment, and escalation decisions.
Anthropic’s ASL system↗🔗 web★★★★☆AnthropicAnthropic's Responsible Scaling PolicyAnthropic introduces a systematic approach to managing AI risks by establishing AI Safety Level (ASL) Standards that dynamically adjust safety measures based on model capabiliti...governancecapabilitiessafetyx-risk+1Source ↗Notes is modeled after Biosafety Levels (BSL-1 through BSL-4) used for handling dangerous pathogens. Each level specifies both capability thresholds and required safeguards.
Level
Capability Definition
Deployment Safeguards
Security Standard
ASL-1
No meaningful catastrophic risk
Standard terms of service
Basic security hygiene
ASL-2
Meaningful uplift but not beyond publicly available info
Content filtering, usage policies
Current security measures
ASL-3
Significantly enhances non-state actor capabilities beyond public sources
Could substantially accelerate CBRN development or enable autonomous harm
Nation-state level protections (details TBD)
Air-gapped systems, extensive vetting
Current Status (January 2026): All Claude models currently operate at ASL-2. Anthropic activated ASL-3 safeguard development in May 2025 following evaluations of Claude Opus 4.
RSP v2.2 Changes: The October 2024 update↗🔗 web★★★★☆AnthropicAnthropic: Announcing our updated Responsible Scaling PolicygovernancecapabilitiesSource ↗Notes separated “ASL” to refer to safeguard standards rather than model categories, introducing distinct “Capability Thresholds” and “Required Safeguards.” Critics argue↗🔗 webAnthropic's Responsible Scaling Policy Update Makes a Step BackwardsAnthropic's recent Responsible Scaling Policy update reduces specificity and concrete metrics for AI safety thresholds. The changes shift from quantitative benchmarks to more qu...governancecapabilitiessafetyevaluationSource ↗Notes this reduced specificity compared to v1.
OpenAI’s Preparedness Framework↗🔗 web★★★★☆OpenAIPreparedness Frameworkbiosecuritydual-use-researchx-riskSource ↗Notes underwent a major revision in April 2025 (v2.0), simplifying from four risk levels to two actionable thresholds.
Risk Domain
High Threshold
Critical Threshold
Bio/Chemical
Meaningful assistance to novices creating known threats
Simplified from Low/Medium/High/Critical to just High and Critical
Removed “Persuasion” as tracked category (now handled through standard safety)
Added explicit threshold for recursive self-improvement: achieving generational improvement (e.g., o1 to o3) in 1/5th the 2024 development time
Safety Advisory Group (SAG) now oversees all threshold determinations
Recent Evaluations: OpenAI’s January 2026 o3/o4-mini system card↗🔗 web★★★★☆METRmetr.orgevaluationsdangerous-capabilitiesautonomous-replicationSource ↗Notes reported neither model reached High threshold in any tracked category, though biological and cyber capabilities continue trending upward.
Responsible Scaling Policy↗🔗 web★★★★☆AnthropicAnthropic's Responsible Scaling PolicyAnthropic introduces a systematic approach to managing AI risks by establishing AI Safety Level (ASL) Standards that dynamically adjust safety measures based on model capabiliti...governancecapabilitiessafetyx-risk+1Source ↗Notes
Sep 2023
v2.2 (May 2025)
ASL levels, deployment/security standards, external evals
The Seoul AI Safety Summit↗🏛️ government★★★★☆UK GovernmentSeoul Frontier AI Commitmentsself-regulationindustry-commitmentsresponsible-scalinggovernance+1Source ↗Notes achieved a historic first: 16 frontier AI companies from the US, Europe, Middle East, and Asia signed binding-intent commitments. Signatories included Amazon, Anthropic, Cohere, G42, Google, IBM, Inflection AI, Meta, Microsoft, Mistral AI, Naver, OpenAI, Samsung, Technology Innovation Institute, xAI, and Zhipu.ai.
Commitment
Description
Compliance Verification
Safety Framework Publication
Publish framework by France Summit 2025
Public disclosure
Pre-deployment Evaluations
Test models for severe risks before deployment
Self-reported system cards
Dangerous Capability Reporting
Report discoveries to governments and other labs
Voluntary disclosure
Non-deployment Commitment
Do not deploy if risks cannot be mitigated
Self-assessed
Red-teaming
Internal and external adversarial testing
Third-party verification emerging
Cybersecurity
Protect model weights from theft
Industry standards
Follow-up: An additional 4 companies have joined since May 2024. The France AI Action Summit↗🔗 webKey Outcomes of the AI Seoul SummitSource ↗Notes (February 2025) reviewed compliance and expanded commitments.
METR↗🔗 web★★★★☆METRmetr.orgsoftware-engineeringcode-generationprogramming-aisocial-engineering+1Source ↗Notes (Model Evaluation and Threat Research) has emerged as the leading independent evaluator, having conducted pre-deployment assessments for both Anthropic and OpenAI. Founded by Beth Barnes (former OpenAI alignment researcher) in December 2023, METR does not accept compensation for evaluations to maintain independence.
METR’s Role: METR’s GPT-4.5 pre-deployment evaluation↗🔗 web★★★★☆METRmetr.orgevaluationsdangerous-capabilitiesautonomous-replicationSource ↗Notes piloted a new form of third-party oversight: verifying developers’ internal evaluation results rather than conducting fully independent assessments. This approach may scale better while maintaining accountability.
Coverage Gap: As of late 2025, METR’s analysis↗🔗 web★★★★☆METRMETR's analysis of 12 companiesevaluationsdangerous-capabilitiesautonomous-replicationSource ↗Notes found that while 12 companies have published frontier safety policies, third-party evaluation coverage remains inconsistent, with most evaluations occurring only for the largest US labs.
anthropic.com/responsible-scaling-policy↗🔗 web★★★★☆AnthropicAnthropic's Responsible Scaling PolicyAnthropic introduces a systematic approach to managing AI risks by establishing AI Safety Level (ASL) Standards that dynamically adjust safety measures based on model capabiliti...governancecapabilitiessafetyx-risk+1Source ↗Notes
12 companies published policies; significant variation in specificity
SaferAI: RSP Update Critique↗🔗 webAnthropic's Responsible Scaling Policy Update Makes a Step BackwardsAnthropic's recent Responsible Scaling Policy update reduces specificity and concrete metrics for AI safety thresholds. The changes shift from quantitative benchmarks to more qu...governancecapabilitiessafetyevaluationSource ↗Notes
Anthropic v2.0
Reduced specificity from quantitative to qualitative thresholds
Compare AI-assisted vs. unassisted task completion
Directly measures counterfactual risk
Hard to simulate real adversaries
Autonomous agent tasks
Long-horizon task completion↗🔗 web★★★★☆METRMeasuring AI Ability to Complete Long Tasks - METRResearch by METR demonstrates that AI models' ability to complete tasks is exponentially increasing, with task completion time doubling approximately every 7 months. This metric...capabilitiesSource ↗Notes
Safety Culture StrengthAi Transition Model ParameterSafety Culture StrengthThis page contains only a React component import with no actual content displayed. Cannot assess the substantive content about safety culture strength in AI development.
Positive
Creates explicit accountability mechanisms and public commitments
Racing DynamicsRiskRacing DynamicsRacing dynamics analysis shows competitive pressure has shortened safety evaluation timelines by 40-60% since ChatGPT's launch, with commercial labs reducing safety work from 12 weeks to 4-6 weeks....Quality: 72/100
Mixed
Could reduce racing if mutually honored; or create false confidence
Human Oversight QualityAi Transition Model ParameterHuman Oversight QualityThis page contains only a React component placeholder with no actual content rendered. Cannot assess substance, methodology, or conclusions.
Positive
Formalizes oversight requirements and third-party evaluation
International CoordinationAi Transition Model ParameterInternational CoordinationThis page contains only a React component placeholder with no actual content rendered. Cannot assess importance or quality without substantive text.
Positive
Seoul commitments demonstrate cross-border coordination feasibility
RSPs represent an important governance innovation that creates explicit links between capabilities and safety requirements. Their current contribution to safety is moderate but improving: the 2025 policy updates and Seoul commitments demonstrate industry convergence on the RSP concept, while third-party evaluation coverage expands. However, effectiveness depends critically on:
Voluntary compliance in the absence of legal enforcement
Evaluation quality and ability to detect dangerous capabilities
Competitive dynamics and whether labs will honor commitments under pressure
Governance structures within labs that can override commercial interests
RSPs should be understood as a foundation for stronger governance rather than a complete solution. Their greatest value may be in establishing precedents and norms that can later be codified into binding regulation.