Responsible Scaling: Comparing Government Guidance and Company Policy
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Institute for AI Policy and Strategy
Published by the Institute for AI Policy and Strategy (IAPS), this report is relevant for researchers and policymakers examining whether voluntary corporate AI safety commitments like RSPs are sufficiently rigorous or need regulatory reinforcement.
Metadata
Summary
This report from IAPS analyzes Responsible Scaling Policies (RSPs) adopted by AI companies, comparing them against government guidance frameworks. It critiques existing RSP implementations—particularly Anthropic's—for vague risk threshold definitions and insufficient external oversight, and recommends more rigorous, verifiable safety level criteria with independent accountability mechanisms.
Key Points
- •Compares voluntary company Responsible Scaling Policies with emerging government AI safety guidance across multiple jurisdictions
- •Critiques RSPs for lacking precise, measurable risk thresholds that would trigger mandatory safety interventions or capability pauses
- •Argues external oversight mechanisms are largely absent, leaving companies to self-certify compliance with their own safety commitments
- •Recommends stronger definitions of AI Safety Levels (ASLs) and third-party evaluation requirements to make RSPs more credible
- •Highlights the gap between the ambition of responsible scaling frameworks and their current enforceability
Review
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Frontier AI Labs (Overview) | -- | 85.0 |
Cached Content Preview
Responsible Scaling: Comparing Government Guidance and Company Policy — Institute for AI Policy and Strategy
0
Responsible Scaling: Comparing Government Guidance and Company Policy
Research Report
Mar 11
Written By Bill Anderson-Samways
Read the full report
As advanced AI systems scale up in capability, companies will need to implement practices to identify, monitor, and mitigate potential risks. “Responsible capability scaling” is the specification of progressively higher levels of risk, roughly corresponding to model size or capabilities, and entailing progressively more stringent response measures. We evaluate the original example of a Responsible Scaling Policy (RSP) – that of Anthropic – against guidance on responsible capability scaling from the UK Department for Science, Innovation and Technology (DSIT).
Our top recommendations based on our critique of Anthropic’s RSP are:
Anthropic and other AI companies should define verifiable risk thresholds for their AI safety levels (ASLs - or equivalent), informed by tolerances for “societal risk” (SR) in other industries. Such risk thresholds should likely be lower than Anthropic’s current thresholds, and should be defined in terms of absolute risk above a given baseline, rather than relative risk over said baseline.
The literature we survey suggests that “maximum” SR tolerances for events involving ≥1,000 fatalities – Anthropic’s definition of a “catastrophic risk” – should range between 1 E-04 to 1 E-10 such events per year. “Broadly acceptable” tolerances are generally two orders of magnitude lower.
We tentatively suggest that Anthropic set their ASL-4 and ASL-3 thresholds in the “maximum” and “broadly acceptable” SR ranges, respectively. We think that Anthropic’s current risk thresholds probably exceed those ranges.
Ultimately, a government body, such as UK DSIT or the US National Institute for Standards and Technology (NIST), or an industry body such as the Frontier Model Forum (FMF), should develop standardized operationalizations of risk-thresholds for RSPs.
Anthropic and other companies should specify thresholds for a more granular set of risk types at a given safety level – for example, not just “misuse” but “biological misuse” as opposed to “cyber misuse.”
Anthropic and other companies should detail when they will alert government authorities of identified risks – currently, their RSP does not mention communication with governments outside of a narrow case (involving Anthropic’s response to a bad actor scaling dangerously fast). We suggest that risks should at minimum be communicated to relevant agencies when they reach a given threshold, for example, the ASL-3 or ASL-4 thresholds outlin
... (truncated, 4 KB total)364bc819bcb4c270 | Stable ID: sid_coyTuOdRWj