Longterm Wiki
Back

How Anthropic’s AI Safety Framework Misses the Mark | The Midas Project

web

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
AnthropicOrganization74.0

Cached Content Preview

HTTP 200Fetched Feb 25, 2026286 KB
How Anthropic’s AI Safety Framework Misses the Mark | The Midas Project No headings found How Anthropic’s AI Safety Framework Misses the Mark How Anthropic’s AI Safety Framework Misses the Mark Jack Kelly • Jul 8, 2025 • 9 min read Get updates on AI risk → Join our newsletter Get updates on AI risk → Join our newsletter Anthropic has tried to build a reputation for taking AI safety seriously, and its Responsible Scaling Policy has become a central pillar of that identity. But while the company presents this framework as a rigorous safeguard, it ultimately falls short of the rigor needed to meaningfully protect against the risks posed by increasingly capable AI systems. Anthropic was the first AI company to release a Frontier AI Safety Policy , known as their Responsible Scaling Policy (RSP) . These frameworks, sometimes called “red line” or “if-then” commitments, focus on defining a set of safety and security risk mitigations that will be put into place before deploying increasingly powerful models. Anthropic describes their policy, a detailed 23-page public document , as a “public commitment not to train or deploy models capable of causing catastrophic harm unless we have implemented safety and security measures that will keep risks below acceptable levels.” They insist that this policy is more than a symbolic gesture. Anthropic describes this policy as core to the culture and purpose of the company. Their co-founder, Tom Brown, says that “in the same way that the U.S. treats the Constitution as the holy document … the RSP is like the holy document for Anthropic.” Co-founder Dario Amodei stated that the RSP "forces unity because if any part of the org is not in line with our safety values, it shows up through the RSP. The RSP is going to block them from doing what they want to do … it’s not just a bunch of bromides that we repeat, it’s something that if you show up here and you’re not aligned, you actually run into it.” The biggest problem with Anthropic’s RSP is that its risk thresholds are extraordinarily high. Practically speaking, the current RSP doesn’t require any further deployment and security safeguards beyond today’s safeguards , until Anthropic releases a model that has the capability to: allow effective compute (a proxy for AI progress) to increase 1000x within a single year; or allow entry-level PhD biologists to approximate the capabilities of world-class, state-backed bioweapons teams. Both of these standards are astronomical. Even AI models that are halfway to meeting these thresholds from today’s state of the art warrant significantly stronger safety assurances than what the policy currently requires. Beyond this central issue, Anthropic’s RSP suffers from a lack of credibility caused by last-minute changes serving to weaken the policy, as well as unclear language that makes it difficult for the policy to be understood by employees, outsiders, and the regulators capable of holding Anthropic accountable. Moving the Goalposts The 

... (truncated, 286 KB total)
Resource ID: 25bbf79a2dec9ae2 | Stable ID: OTU2Yjg3N2