AI Lab Watch: Commitments Tracker

web

ailabwatch.org·ailabwatch.org/resources/commitments

Useful for researchers and policymakers tracking the gap between AI lab safety rhetoric and demonstrated practice; complements formal regulatory frameworks by documenting voluntary commitments.

Metadata

Importance: 62/100tool pagereference

Summary

AI Lab Watch's Commitments Tracker monitors and evaluates the public safety commitments made by major AI laboratories, tracking whether frontier AI companies are honoring pledges related to safety, governance, and responsible deployment. It serves as an accountability tool by systematically documenting what labs have promised and assessing follow-through.

Key Points

•Tracks public safety commitments made by frontier AI labs such as OpenAI, Anthropic, DeepMind, and others
•Provides accountability by comparing stated commitments against actual lab behavior and policies
•Covers areas including safety testing, whistleblower protections, government reporting, and international coordination
•Serves as a watchdog resource for civil society and policymakers evaluating AI lab trustworthiness
•Aggregates commitments from voluntary pledges, government agreements, and industry frameworks in one reference

Cited by 7 pages

Page	Type	Quality
Corporate Influence on AI Policy	Crux	66.0
Corporate AI Safety Responses	Approach	68.0
AI Policy Effectiveness	Analysis	64.0
International AI Safety Summit Series	Event	63.0
AI Lab Safety Culture	Approach	62.0
Tool-Use Restrictions	Approach	91.0
AI Whistleblower Protections	Policy	63.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202614 KB

AI Lab Watch AI Lab Watch 

 Categories

 Companies

 Resources

 Blog

 About

 This page collects AI companies' commitments relevant to AI safety and extreme risks.

 Commitments by several companies

 White House voluntary commitments

 The White House voluntary commitments were joined by Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI in July 2023 ; Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, and Stability AI in September 2023 ; and Apple in July 2024 .

 The commitments "apply only to generative models that are overall more powerful than the current most advanced model produced by the company making the commitment," but all relevant companies have created more powerful models since making the commitments. The commitments most relevant to safety are:

 
 "[I]nternal and external red-teaming of models or systems in areas including misuse, societal risks, and national security concerns, such as bio, cyber, [autonomous replication,] and other safety areas."

 "[A]dvanc[e] ongoing research in AI safety, including on the interpretability of AI systems' decision-making processes and on increasing the robustness of AI systems against misuse."

 "Work toward information sharing among companies and governments regarding trust and safety risks, dangerous or emergent capabilities, and attempts to circumvent safeguards": "establish or join a forum or mechanism through which they can develop, advance, and adopt shared standards and best practices for frontier AI safety."

 "Invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights": "limit[] access to model weights to those whose job function requires it and establish[] a robust insider threat detection program consistent with protections provided for their most valuable intellectual property and trade secrets. In addition . . . stor[e] and work[] with the weights in an appropriately secure environment to reduce the risk of unsanctioned release."

 "Incent third-party discovery and reporting of issues and vulnerabilities": establish "bounty systems, contests, or prizes to incent the responsible disclosure of weaknesses, such as unsafe behaviors, or [] include AI systems in their existing bug bounty programs."

 "Publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use, including discussion of societal risks, such as effects on fairness and bias": "publish reports for all new significant model public releases . . . . These reports should include the safety evaluations conducted (including in areas such as dangerous capabilities, to the extent that these are responsible to publicly disclose) . . . and the results of adversarial testing conducted to evaluate the model's fitness for deployment" and "red-teaming and safety procedures."

 
 Some companies announced and commented on their commitments, including Microsoft (which simultaneously made additional commitments), Google ,

... (truncated, 14 KB total)

Resource ID: 91ca6b1425554e9a | Stable ID: sid_96geMqeZUr