Responsible Scaling: Comparing Government Guidance and Company Policy

web

Institute for AI Policy and Strategy·iaps.ai/research/responsible-scaling

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Institute for AI Policy and Strategy

Published by the Institute for AI Policy and Strategy (IAPS), this report is relevant for researchers and policymakers examining whether voluntary corporate AI safety commitments like RSPs are sufficiently rigorous or need regulatory reinforcement.

Metadata

Importance: 68/100organizational reportanalysis

Summary

This report from IAPS analyzes Responsible Scaling Policies (RSPs) adopted by AI companies, comparing them against government guidance frameworks. It critiques existing RSP implementations—particularly Anthropic's—for vague risk threshold definitions and insufficient external oversight, and recommends more rigorous, verifiable safety level criteria with independent accountability mechanisms.

Key Points

•Compares voluntary company Responsible Scaling Policies with emerging government AI safety guidance across multiple jurisdictions
•Critiques RSPs for lacking precise, measurable risk thresholds that would trigger mandatory safety interventions or capability pauses
•Argues external oversight mechanisms are largely absent, leaving companies to self-certify compliance with their own safety commitments
•Recommends stronger definitions of AI Safety Levels (ASLs) and third-party evaluation requirements to make RSPs more credible
•Highlights the gap between the ambition of responsible scaling frameworks and their current enforceability

Review

The research provides a critical analysis of Anthropic's Responsible Scaling Policy (RSP), focusing on the need for more precise and verifiable risk management strategies in AI development. By comparing Anthropic's approach with UK government guidance, the study highlights the importance of defining clear, standardized risk thresholds that account for potential societal impacts of advanced AI systems. The paper offers several key recommendations, including the development of more granular risk assessments, lower risk tolerance thresholds, and improved communication protocols with government agencies. The authors suggest that current industry practices may underestimate potential risks, particularly for high-capability AI systems. The research emphasizes the need for external scrutiny and standardized risk evaluation methods, proposing that government bodies or industry forums should take the lead in creating comprehensive guidelines for responsible AI scaling.

Cited by 1 page

Page	Type	Quality
Frontier AI Labs (Overview)	--	85.0

Cached Content Preview

HTTP 200Fetched Apr 7, 20264 KB

Responsible Scaling: Comparing Government Guidance and Company Policy &mdash; Institute for AI Policy and Strategy 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 

 
 
 

 
 
 
 
 
 
 

 
 
 
 
 
 

 
 0 
 
 
 
 
 

 

 

 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Responsible Scaling: Comparing Government Guidance and Company Policy

 
 
 
 
 Research Report 
 
 

 
 
 Mar 11 
 
 Written By Bill Anderson-Samways 
 
 
 
 

 
 
 

 
 
 Read the full report
 
 

 

 
 
 As advanced AI systems scale up in capability, companies will need to implement practices to identify, monitor, and mitigate potential risks. “Responsible capability scaling” is the specification of progressively higher levels of risk, roughly corresponding to model size or capabilities, and entailing progressively more stringent response measures. We evaluate the original example of a Responsible Scaling Policy (RSP) – that of Anthropic – against guidance on responsible capability scaling from the UK Department for Science, Innovation and Technology (DSIT).

 

 Our top recommendations based on our critique of Anthropic’s RSP are: 

 Anthropic and other AI companies should define verifiable risk thresholds for their AI safety levels (ASLs - or equivalent), informed by tolerances for “societal risk” (SR) in other industries. Such risk thresholds should likely be lower than Anthropic’s current thresholds, and should be defined in terms of absolute risk above a given baseline, rather than relative risk over said baseline.

 The literature we survey suggests that “maximum” SR tolerances for events involving ≥1,000 fatalities – Anthropic’s definition of a “catastrophic risk” – should range between 1 E-04 to 1 E-10 such events per year. “Broadly acceptable” tolerances are generally two orders of magnitude lower.

 We tentatively suggest that Anthropic set their ASL-4 and ASL-3 thresholds in the “maximum” and “broadly acceptable” SR ranges, respectively. We think that Anthropic’s current risk thresholds probably exceed those ranges.

 Ultimately, a government body, such as UK DSIT or the US National Institute for Standards and Technology (NIST), or an industry body such as the Frontier Model Forum (FMF), should develop standardized operationalizations of risk-thresholds for RSPs.

 
 Anthropic and other companies should specify thresholds for a more granular set of risk types at a given safety level – for example, not just “misuse” but “biological misuse” as opposed to “cyber misuse.”

 Anthropic and other companies should detail when they will alert government authorities of identified risks – currently, their RSP does not mention communication with governments outside of a narrow case (involving Anthropic’s response to a bad actor scaling dangerously fast). We suggest that risks should at minimum be communicated to relevant agencies when they reach a given threshold, for example, the ASL-3 or ASL-4 thresholds outlin

... (truncated, 4 KB total)

Resource ID: 364bc819bcb4c270 | Stable ID: sid_coyTuOdRWj