Skip to content
Longterm Wiki
Navigation
Updated 2026-04-12HistoryData
Page StatusResponse
Edited 1 day ago3.0k words
Content3/13
SummaryScheduleEntityEdit historyOverview
Tables2/ ~12Diagrams0/ ~1Int. links17/ ~24Ext. links0/ ~15Footnotes0/ ~9References0/ ~9Quotes0Accuracy0

Voluntary AI Commitments Enforcement

Approach

Voluntary AI Commitments Enforcement

A well-structured, data-grounded analysis of voluntary AI safety commitments showing significant compliance gaps (17% average on model weight security, declining from 69% to 45% compliance across cohorts), with the article correctly identifying that voluntary frameworks function as interim measures with real but insufficient enforcement mechanisms, and that political fragility (post-EO 14179) and structural competitive dynamics undermine their long-term viability as primary governance tools.

3k words

Quick Assessment

DimensionAssessment
Formal legal forceNone — commitments are explicitly non-binding
Primary enforcement pathwayReputational pressure + potential FTC Section 5 action for deceptive public pledges
Compliance recordFirst cohort (July 2023): ~69%; Second cohort (Sept 2023): ~45%
Model weight securityCritically poor — average score of 17% across 16 companies analyzed, 11 companies scoring 0%
Best-performing areaRed-teaming and cybersecurity investment (self-reported)
Worst-performing areaModel weight protection; post-release output detection
Path to binding rulesPartial — some U.S. states now codify voluntary standards (NIST AI RMF, ISO/IEC 42001) into law
Political durabilityFragile — Biden-era executive orders revoked by Trump administration (EO 14179, January 2025)
Overall verdictMeaningful interim step with real reputational stakes, but insufficient as primary governance mechanism

Overview

Voluntary AI Safety Commitments are non-binding pledges made by leading AI developers to uphold safety, security, and transparency standards in the development of advanced generative AI models. The most prominent iteration was announced by the Biden-Harris Administration on July 21, 2023, when seven companies — Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI — publicly affirmed eight specific commitments at the White House. A second cohort, including Salesforce and seven other enterprise AI companies, joined in September 2023. By May 2024, the Seoul AI Safety Summit expanded the framework further, with 16 companies signing frontier AI safety commitments modeled on Responsible Scaling Policy (RSP) principles.

The commitments span three broad categories: safety (internal and external red-teaming for misuse, societal harms, biosecurity, and cybersecurity risks), security (protecting model weights from theft, establishing vulnerability reporting programs), and trust (watermarking AI-generated content, publishing model cards, sharing capability and limitation information publicly). Crucially, the scope applies forward-looking — companies pledged these measures for generative models more powerful than their current most advanced model at the time of signing, not retroactively to existing deployed systems.

These commitments were designed explicitly as interim measures, intended to remain in effect until substantially equivalent government regulations are enacted. They do not create new legal obligations and, by the companies' own framing, align with existing laws rather than extend beyond them. Critics note this framing creates a low bar: companies are effectively promising to continue practices they were already legally permitted to undertake, while the commitment dissolves once regulations appear — regulations whose content the companies will likely lobby to shape.


History

The July 2023 White House commitments emerged from a period of intense public attention to generative AI following the release of GPT-4 and Claude. The Biden-Harris Administration chose a voluntary approach rather than emergency rulemaking, reflecting both the speed of AI development and the political difficulty of passing AI-specific legislation through a divided Congress. This approach had precedent in U.S. self-regulatory tradition: the NIST AI Risk Management Framework (AI RMF), released in January 2023, had already established a voluntary standard that companies could adopt.

The September 2023 expansion added enterprise-focused commitments and, notably, new pledges specifically targeting AI-generated image-based sexual abuse material — including commitments on dataset sourcing, testing, and the removal of nude images from training data. In October 2023, the G7 Hiroshima AI Process produced a parallel set of international commitments emphasizing risk-based governance, and Anthropic joined the Munich AI Elections Accord in 2024 to address AI interference in elections.

The May 2024 Seoul AI Safety Summit marked a qualitative shift. Sixteen companies — including Anthropic, Google DeepMind, and OpenAI — signed the Seoul Frontier AI Safety Commitments, which were explicitly modeled on Responsible Scaling Policy frameworks. These commitments included if-then structures: companies pledged that if their models crossed specified capability thresholds (risk tripwires), they would halt deployment or take specific mitigation actions. This represented a more rigorous approach than the 2023 White House pledges, incorporating measurable triggers rather than open-ended aspirational language.

The Trump administration's revocation of Biden-era AI executive orders in January 2025 (EO 14179) removed the White House as an active sponsor of this framework, raising questions about whether the commitments would receive continued government attention. Companies have indicated they intend to maintain their internal standards regardless, but the reputational and political infrastructure supporting the commitments has weakened.


How It Works

The Commitment-by-Commitment Framework

The eight White House commitments can be tracked against what companies promised, what deliverables were expected, and what evidence of compliance exists:

CommitmentWhat Was PromisedExpected DeliverableEvidence AvailableAssessment
Internal red-teamingTest models for misuse, societal harm, national security risks before releasePre-deployment red team reportsSalesforce: 19 internal exercises; Anthropic: risk assessment framework describedPartially verified via self-report
External red-teamingEngage third-party evaluatorsPublished external red team resultsSalesforce: 2 external exercises; OpenAI Preparedness Framework references external evalsSparse; largely self-reported
Cybersecurity protectionsProtect model weights from theft; secure infrastructureTechnical controls, secure API storageSalesforce: $18.9M bug bounty program; response steering technology describedInvestment confirmed; weight security scores critically low (avg 17%)
Vulnerability reportingEstablish bounty programs for external researchersActive bug bounty programs with public termsSalesforce's $18.9M investment documented; UK/US vulnerability reporting tested as of February 2024Best-documented commitment
Information sharingShare threat information with governments and peersParticipation in information-sharing mechanismsReferenced in White House documents; no public disclosure of specific sharing instancesWeakly documented
Watermarking / labelingDevelop mechanisms to label AI-generated contentDeployed watermarking or content credentialsCommitments made; technical challenges in detection notedPartial implementation; detection remains unsolved
Societal risk researchInvest in understanding societal harmsPublished research on bias, safety, societal impactsGoogle trust/safety policy for GenAI launches; Anthropic Constitutional AI papersOngoing, but commitment scope vague
Model cards / transparencyPublish information on model capabilities and limitationsPublic model documentationModel cards published by most major labsBest-complied-with commitment

Seoul Frontier AI Safety Commitments (May 2024)

The Seoul commitments introduced a more structured approach with 16 signatory companies. Key additions over the 2023 framework include:

  • If-then structures: Companies define capability thresholds; if a model crosses a tripwire (e.g., ability to meaningfully assist in bioweapons creation), the company must halt deployment, share findings with governments, or take specified mitigation actions.
  • Board-level involvement: Governance structures require senior leadership accountability for safety decisions.
  • External auditor ecosystem: Commitments reference the development of independent auditing capacity, though this infrastructure remains nascent.

Company-Specific Frameworks

Beyond the government-sponsored pledges, several companies have developed their own internal frameworks that formalize commitment-like structures:

Anthropic's Responsible Scaling Policy (RSP) defines AI Safety Levels (ASLs) analogous to biosafety levels, specifying what capabilities trigger what protective measures. The RSP is updated periodically and is the most detailed public framework of this type, though critics note it is still self-authored and self-evaluated.

OpenAI's Preparedness Framework establishes a Preparedness team responsible for tracking catastrophic risks across biosecurity, cybersecurity, nuclear/radiological threats, and model autonomy. The framework includes a "scorecard" system for risk assessment, though the methodology and thresholds are determined internally.

Google DeepMind's Frontier Safety Framework defines Critical Capability Levels (CCLs) and specifies evaluation protocols. Like the others, it is self-administered.

Meta's Frontier AI Framework takes a somewhat different approach, including a public commitment to open-source specific model weights, which creates a different enforcement dynamic — once weights are released, the safety properties cannot be recalled.

Enforcement Pathways

The absence of formal enforcement does not mean enforcement is entirely absent. Several indirect mechanisms operate:

FTC Section 5 liability: Public pledges can constitute representations under the Federal Trade Commission Act. If a company publicly commits to a safety practice and then demonstrably fails to follow it while causing consumer harm, the FTC may characterize this as an unfair or deceptive act. The FTC's 2024 Operation AI Comply — which targeted companies making false claims about AI capabilities — demonstrated the agency's willingness to use existing consumer protection tools in the AI space, though this action addressed fraud rather than safety commitment violations.

Tort law pathway: Legal scholars have argued that voluntary commitments, once widely adopted, can constitute industry best practices. In negligence litigation, failure to meet an industry best practice can establish breach of the duty of care. This means a company that publicly commits to red-teaming and then deploys a model without it — which subsequently causes harm — may face liability under existing negligence doctrine.

State law integration: Several U.S. states have begun codifying voluntary frameworks into enforceable legal requirements. California requires alignment with recognized voluntary frameworks for certain AI uses. Colorado mandates risk management practices aligned with voluntary standards for high-risk AI systems, with affirmative defenses available to compliant companies. Texas offers affirmative legal defenses to companies that document compliance with frameworks such as the NIST AI RMF or ISO/IEC 42001. New York has empowered the state Attorney General to designate applicable standards.


Risks Addressed

Voluntary AI commitments target a specific cluster of risks associated with frontier AI systems:

Misuse risks: Red-teaming requirements directly address the risk that capable AI models could be used to assist in creating biological, chemical, or radiological weapons, conducting sophisticated cyberattacks, or enabling other mass-casualty harms. The biosecurity focus reflects the judgment — shared across Anthropic, OpenAI, and Google DeepMind — that frontier models represent a meaningful uplift risk for actors seeking to cause catastrophic harm.

Scheming and deceptive behavior: To a lesser degree, commitments on model evaluation and capability transparency address concerns about models that might misrepresent their capabilities or behave differently during evaluation than during deployment. External red-teaming in particular is intended to probe for behaviors that internal evaluators might miss.

Concentration of power and erosion of oversight: Watermarking and transparency requirements aim to preserve the ability of humans to identify AI-generated content, which is considered a prerequisite for maintaining meaningful societal oversight as AI outputs become more prevalent.

Systemic security risks: Model weight protection commitments address the risk that weights for the most capable systems could be stolen by state actors or criminal groups, removing the safety controls implemented by the original developers.

Societal-scale harms: Commitments on bias, discrimination, and societal risk research address diffuse harms that may not rise to the level of catastrophic risk but could erode trust, worsen inequality, or destabilize information environments at scale.


Limitations

The Enforcement Gap

The fundamental problem is structural: companies are simultaneously the entities making the commitments, implementing them, evaluating their own compliance, and reporting on progress. There is no independent auditor with access to model internals, training data, or red-team results. Self-reported progress — such as Salesforce's claim of a 35% reduction in toxic outputs following red-teaming — cannot be independently verified by the public, regulators, or researchers.

Compliance data that does exist is concerning. An analysis of 16 companies found systemic failure on model weight security commitments, with an average score of 17% and 11 companies scoring 0%. Overall, the first cohort of July 2023 signatories achieved approximately 69% compliance across tracked commitments, while the second cohort achieved only approximately 45% — suggesting that companies that joined later may have been less prepared or less committed to implementation.

Speed Mismatch

AI capabilities have advanced from GPT-2 to GPT-4 in roughly four years — a pace vastly faster than the decades-long trajectory of safety standard development in aviation, pharmaceuticals, or nuclear energy. The traditional pipeline from voluntary standard to industry norm to regulatory requirement operates on timescales measured in years or decades. Critics argue that this pipeline is simply too slow for a technology that may produce systems with qualitatively new capabilities within months.

Competitive Dynamics

Without binding external rules, companies face a structural prisoner's dilemma: safety investment is costly, and a company that cuts safety corners while competitors maintain standards may gain a competitive advantage. The commitments attempt to address this by creating mutual pledges, but without verification or penalties, defection is difficult to detect or sanction. This dynamic is explicitly acknowledged by former OpenAI board members who have argued that the industry cannot self-govern effectively and that binding regulations comparable to other high-risk industries are necessary.

Scope Limitations

The commitments apply only to models more powerful than each company's current most advanced model at the time of signing. This means the commitments did not apply retroactively to GPT-4, Claude 2, or Gemini at the time of the July 2023 pledges. As those models are deployed more widely over time, any harms they cause fall outside the commitment scope.

The company-specific RSP frameworks have their own scope limitations. They define thresholds for triggering enhanced safety measures, but the companies set those thresholds themselves. There is no external check on whether the thresholds are calibrated appropriately to the actual risk level.

Governance Gaps for Agentic AI

All existing commitment frameworks were designed with static, query-response AI systems in mind. They do not adequately address autonomous AI agents that can take multi-step actions, access external tools, and operate with limited real-time human oversight. Questions about scope boundaries, permission structures, and accountability for multi-step agentic actions are largely unaddressed by current voluntary frameworks, even as agentic deployments become common.

Political Fragility

The Biden-era executive order that provided the political infrastructure for these commitments was revoked in January 2025. The White House is no longer an active enforcer of even the reputational stakes attached to the commitments. Companies have indicated they will maintain their internal standards regardless, but the signal from government is now that AI development speed is prioritized over precautionary safety measures — a shift that may erode internal institutional support for costly safety investments over time.


Criticisms and Concerns

The most pointed criticism is that voluntary commitments function as what researchers have called ethics washing or safety washing — providing the appearance of responsible governance without the substance. The Google Project Maven case is frequently cited as a cautionary example: in 2018, following employee protests, Google pledged to avoid AI use in weapons and surveillance. In 2021, the company signed a $1.2 billion contract with Israel (Project Nimbus) for national security applications, and officially removed the 2018 commitment in 2023.

The EA Forum and LessWrong communities have expressed broad skepticism toward voluntary commitments, with community members characterizing them as easily breakable pledges that reflect PR value rather than substantive safety action. AI Lab Watch evaluations — which attempt to score labs on safety-relevant behaviors — have been criticized for over-rewarding non-binding commitments relative to measurable technical actions.

A related concern is that voluntary commitments may actively impede stronger regulation. By providing a credible-sounding governance narrative, they reduce political pressure for binding rules and give companies a template to point to when legislators propose more stringent requirements. The 2024 congressional proposal for a 10-year moratorium on state AI laws — which was rejected 99-1 in the Senate — illustrated how industry actors use existing governance frameworks to argue against new mandates.

Finally, the opt-out problem: companies that decline to join voluntary commitment frameworks face no consequences. Research on why some AI companies join non-binding safety agreements while others decline suggests that participation often reflects strategic positioning — the ability to influence the norms being set — rather than a uniform commitment to the underlying safety practices.


Key Uncertainties

  • Whether FTC enforcement will be applied: The FTC's Section 5 pathway has been articulated by legal scholars but not yet tested against a voluntary AI commitment violation specifically. The current administration's posture toward AI regulation makes this less likely near-term.
  • Whether RSP frameworks set appropriate thresholds: Companies self-define the capability levels that trigger enhanced safety measures. Whether those thresholds are calibrated to actual risk is not independently verifiable.
  • Long-term compliance trajectory: The declining compliance trend from first to second cohort (69% to 45%) may reflect selection effects, learning curves, or genuine erosion of commitment. Longer-term data would clarify which interpretation is correct.
  • Effectiveness of state-level codification: As states like Colorado, Texas, and California incorporate voluntary standards into law, the practical enforceability of those provisions will depend on litigation outcomes that have not yet materialized.
  • Whether agentic AI creates commitment voids: As frontier labs deploy increasingly autonomous systems, the gap between existing commitment frameworks and actual risk landscapes may widen substantially.

Sources

Related Wiki Pages

Top Related Pages

Analysis

AI Governance Effectiveness Analysis

Approaches

Corporate AI Safety ResponsesResponsible Scaling Policies

Organizations

Google DeepMindFrontier Model ForumGovernment AI Actors Overview

Concepts

Governance Overview

Historical

International AI Safety Summit Series

Policy

US Executive Order on Safe, Secure, and Trustworthy AI