Lab Behavior & Industry
- Quant.OpenAI has compressed safety evaluation timelines from months to just a few days, with evaluators reporting 95%+ reduction in testing time for models like o3 compared to GPT-4's 6+ month evaluation period.S:4.5I:4.5A:4.0
- Quant.AI labs demonstrate only 53% average compliance with voluntary White House commitments, with model weight security at just 17% compliance across 16 major companies.S:4.0I:4.0A:4.0
- Quant.The capability gap between open-source and closed AI models has narrowed dramatically from 16 months in 2024 to approximately 3 months in 2025, with DeepSeek R1 achieving o1-level performance at 15x lower cost.S:4.0I:4.0A:3.5
Quick Assessment
Section titled βQuick Assessmentβ| Dimension | Assessment | Evidence |
|---|---|---|
| Overall Compliance | Mixed (53% average) | August 2025 study of 16 companies found significant variation; OpenAI scored 83%, average was 53% |
| Evaluation Timeline Trend | Declining | OpenAI reduced testing from months to days for some models; FT reports βweeksβ compressed to βdaysβ |
| Safety Team Retention | Concerning | 25+ senior departures from OpenAI in 2024; Superalignment team dissolved |
| Transparency | Inadequate | Google Gemini 2.5 Pro released without model card; OpenAI GPT-4.1 released without technical safety report |
| Open-Source Gap | Narrowing | Gap reduced from 16 months to approximately 3 months in 2025; DeepSeek R1 achieved near-parity |
| External Red Teaming | Standard but Limited | 750+ researchers engaged via HackerOne; 15-30 day engagement windows may be insufficient |
| Whistleblower Protection | Underdeveloped | Only OpenAI has published full policy (after media pressure); California SB 53 protections start 2026 |
Overview
Section titled βOverviewβThis page tracks measurable indicators of AI laboratory behavior, safety practices, and industry transparency. These metrics help assess whether leading AI companies are following responsible development practices and honoring their public commitments.
Understanding lab behavior is critical because corporate practices directly influence AI safety outcomes. Even the best technical safety research is insufficient if labs are racing to deploy systems without adequate testing, suppressing internal safety concerns, or failing to disclose dangerous capabilities.
Lab Behavior Dynamics
Section titled βLab Behavior Dynamicsβ1. Voluntary Commitment Compliance Rate
Section titled β1. Voluntary Commitment Compliance RateβCurrent Status (2025): Mixed compliance, with significant variation across companies and commitment types.
Key Findings
Section titled βKey FindingsβA comprehensive study from August 2025βπ paperβ β β ββarXivDo AI Companies Make Good on Voluntary Commitments to the White House?Wang, Jennifer, Huang, Kayla, Klyman, Kevin et al. (2025)Research analyzed 16 AI companies' compliance with White House voluntary AI commitments in 2023, finding wide disparities in performance with an average score of 53% and signifi...Source βNotes examining companiesβ adherence to their White House voluntary AI commitments found significant variation:
| Cohort | Companies | Mean Compliance | Range |
|---|---|---|---|
| First (July 2023) | Amazon, Anthropic, Google, Inflection, Meta, Microsoft, OpenAI | 69.0% | 50-83% |
| Second (Sept 2023) | Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, Stability AI | 44.6% | 25-65% |
| Third (July 2024) | Apple | Not fully assessed | N/A |
| Overall Average | 16 companies | 53% | 17-83% |
Compliance by Commitment Type
Section titled βCompliance by Commitment Typeβ| Commitment Area | Average Compliance | Companies at 0% | Best Performer |
|---|---|---|---|
| Model weight security | 17% | 11 of 16 (69%) | Anthropic (75%) |
| Third-party reporting | 34.4% | 8 of 16 (50%) | OpenAI (100%) |
| Red teaming | 62% | 3 of 16 (19%) | OpenAI (100%) |
| Watermarking | 48% | 6 of 16 (38%) | Google (85%) |
| Safety research sharing | 71% | 2 of 16 (13%) | Multiple (100%) |
Related Commitments
Section titled βRelated CommitmentsβIn May 2024, 16 AI companies joined the Frontier AI Safety Commitments, pledging to develop Responsible Scaling Policies (RSPs) by February 2025. Many companies did publish RSPs, though the quality and specificity varies significantly.
Concerning Developments
Section titled βConcerning DevelopmentsβIn April 2025, OpenAI removed a provision from its Preparedness Framework without noting the change in the changelog, raising transparency concerns.
Data Quality: Based on detailed public rubric scoring of disclosed company behavior. Limitations include reliance on public disclosure and potential for selective transparency.
2. RSP Capability Threshold Crossings
Section titled β2. RSP Capability Threshold CrossingsβCurrent Status (2025): First ASL-3 activation announced for Claude Opus 4. No other publicly confirmed threshold crossings, though evaluation methodologies remain contested.
Anthropicβs RSP Framework
Section titled βAnthropicβs RSP FrameworkβAnthropic pioneered the Responsible Scaling Policyβπ webβ β β β βAnthropicAnthropic pioneered the Responsible Scaling PolicySource βNotes approach in September 2023, with the policy now at Version 2.2 (effective May 14, 2025):
| Version | Effective Date | Key Changes |
|---|---|---|
| 1.0 | September 19, 2023 | Initial framework with ASL levels |
| 2.0 | October 15, 2024 | Shifted to qualitative thresholds; safety case methodology |
| 2.1 | March 31, 2025 | Clarified thresholds beyond ASL-3 |
| 2.2 | May 14, 2025 | Amended insider threat scope for ASL-3 Security Standard |
ASL-3 Activation (2025)
Section titled βASL-3 Activation (2025)βAnthropic activated ASL-3 protectionsβπ webβ β β β βAnthropicactivated ASL-3 protectionsSource βNotes for Claude Opus 4, representing the first publicly confirmed capability threshold crossing:
- ASL-3 Security Standard: Increased internal security measures to protect model weights
- ASL-3 Deployment Standard: Targeted measures limiting CBRN weapons misuse risk
- Precautionary basis: Anthropic has not definitively confirmed Claude Opus 4 crossed the capability threshold, but could not βclearly rule out ASL-3 risksβ
Key Threshold Domains
Section titled βKey Threshold Domainsβ| Domain | ASL-2 Threshold | ASL-3 Threshold | Current Status |
|---|---|---|---|
| CBRN capabilities | Basic refusals | Sophisticated non-state attacker resistance | Claude Opus 4 at ASL-3 |
| Autonomous AI R&D | No automation | 1000x scaling acceleration | Not crossed |
| Cybersecurity | Basic vulnerability knowledge | Advanced exploitation assistance | Monitoring |
| Model weight security | Opportunistic theft defense | Sophisticated attacker defense | ASL-3 for Opus 4 |
Recent Changes and Concerns
Section titled βRecent Changes and ConcernsβVersion 2 shift: Anthropic moved from quantitative benchmarks to qualitative descriptions of capability levels. Critics note this reduces verifiability.
Grade decline: According to SaferAI, Anthropicβs grade dropped from 2.2 to 1.9, placing them in the βweakβ category alongside OpenAI and DeepMind. The primary concern is the shift away from precisely defined thresholds.
Institute for AI Policy and Strategy recommendation: Companies should define verifiable risk thresholds informed by βsocietal riskβ tolerances from other industries. Current thresholds may be too lenient.
Detection Challenges
Section titled βDetection ChallengesβSmall improvements in elicitation methodology can dramatically increase scores on evaluation benchmarks. Naive elicitation strategies may significantly underreport risk profiles, potentially missing dangerous capabilities that sophisticated actors could unlock.
Data Quality: Limited public disclosure of evaluation results. Companies control both the design and disclosure of dangerous capability evaluations, creating incentives to underreport concerning findings.
3. Time Between Model Training and Safety Evaluation
Section titled β3. Time Between Model Training and Safety EvaluationβCurrent Status (2025): Decreasing evaluation windows, with some tests compressed from months to days.
Shortened Timelines
Section titled βShortened TimelinesβThe Financial Times reportedβπ webFinancial Times reportedSource βNotes that OpenAI has been slashing safety evaluation time, giving testers βjust a few days for evaluations that had previously been allotted weeks or months to be completed.β
| Model | Reported Evaluation Time | Historical Comparison | Source |
|---|---|---|---|
| GPT-4 (2023) | 6+ months | Baseline | OpenAI system card |
| o3 (2025) | Less than 1 week | 95%+ reduction | FT sources |
| GPT-4.1 (2025) | No technical safety report | N/A | OpenAI statement |
This compression creates severe constraints on thorough safety testing:
- Complex evaluations require substantial time to design and execute
- Emergent capabilities may only become apparent through extended testing
- Red teams need adequate access to explore edge cases and failure modes
- One evaluator told FT: βWe had more thorough safety testing when [the technology] was less importantβ
Pre-Deployment Testing Examples
Section titled βPre-Deployment Testing ExamplesβOpenAI o1 evaluation (December 2024): US AISI and UK AISI conducted joint pre-deployment testing during a βlimited period of pre-deployment access.β Testing was conducted by expert engineers and scientists across three domains:
- Cyber capabilities
- Biological capabilities
- Software and AI development
The evaluators noted that testing was βconducted in a limited time period with finite resources, which if extended could expand the scope of findings.β
Evaluator Access Challenges
Section titled βEvaluator Access ChallengesβMETRβπ webβ β β β βMETRAI models can be dangerous before public deploymentThe article argues that current AI safety frameworks focused solely on pre-deployment testing are inadequate, as internal AI model usage and development can pose significant ris...Source βNotes and other evaluation organizations report that comprehensive risk assessments require:
- Substantial expertise and specialized knowledge
- Direct access to models and training data
- More time than companies typically provide
- Information about technical methodologies that companies often withhold
METR argues that powerful AI systems βare not ordinary productsβ and risks should be addressed throughout the whole AI development lifecycle. They advocate for βearlier evaluations for dangerous capabilities, better forecasting of AI capabilities prior to training, and more emphasis on security and safety throughout development.β
AI Safety Index Findings
Section titled βAI Safety Index FindingsβThe 2025 AI Safety Indexβπ webβ β β ββFuture of Life InstituteFLI AI Safety Index Summer 2025The FLI AI Safety Index Summer 2025 assesses leading AI companies' safety efforts, finding widespread inadequacies in risk management and existential safety planning. Anthropic ...Source βNotes from Future of Life Institute found:
- βAI companies are unlikely to make high-assurance safety cases if timelines are shortβ
- βAI developers control both the design and disclosure of dangerous capability evaluations, creating inherent incentives to underreport alarming resultsβ
- βNaive elicitation strategies cause significant underreporting of risk profilesβ
- The gap between capabilities acceleration and risk management practice is widening
Data Quality: Limited public data on specific evaluation timelines. Most information comes from investigative journalism, evaluator reports, and company transparency documents.
4. External Red-Team Engagement Rate
Section titled β4. External Red-Team Engagement RateβCurrent Status (2025): External red teaming is standard practice at major labs, but engagement scope and disclosure vary significantly.
Major Lab Practices
Section titled βMajor Lab PracticesβOpenAI: Conducts pre-deployment adversarial testing by vetted external experts. External red teamers identify alignment failures, injection vectors, tool misuse paths, and safety regressions. Findings inform mitigation strategies and deployment decisions.
Notable recent engagements: ControlPlaneβs Torin van den Bulk contributed to external red team testing on GPT-4o, Operator, o3-mini, and Deep Research, with live access to model checkpoints.
HackerOne Platform: Provides structured AI Red Teaming (AIRT) as 15 or 30-day engagements. Over 750 AI-focused researchers contribute to engagements for frontier labs including Anthropic, Snap, and Adobe. HackerOne has tested 1,700+ AI assets across customer scopes.
Key Vulnerabilities Found
Section titled βKey Vulnerabilities FoundβFrom HackerOneβs aggregated testing data across 1,700+ AI assets:
| Vulnerability Type | Frequency | Severity | Notes |
|---|---|---|---|
| Cross-tenant data leakage | Found in nearly all enterprise tests | Critical | Highest priority concern |
| Prompt injection | 75%+ of tested models | High | Frequently bypasses safety filters |
| Jailbreak exploits | Variable | High | Success rates vary by methodology |
| Unsafe outputs | Common | Medium-High | Various categories of harmful responses |
Anthropic Jailbreak Challenge Results (2025)
Section titled βAnthropic Jailbreak Challenge Results (2025)βAnthropic partnered with HackerOneβπ webAnthropic partnered with HackerOneSource βNotes to test Constitutional Classifiers on Claude 3.5 Sonnet:
- 300,000+ chat interactions from 339 participants
- $55,000 in bounties paid to four successful teams
- One team found a universal jailbreak passing all levels
- One team found a borderline-universal jailbreak
- Two teams passed all eight levels using multiple individual jailbreaks
Testing Metrics
Section titled βTesting Metricsβ| Metric | Description | Industry Benchmark |
|---|---|---|
| Jailbreak success rate (ASR) | Percentage of successful bypass attempts | Varies: 0% to 63% at 100 attempts |
| Mean time to detect (MTTD) | Time to discover vulnerabilities | 10 min to 7+ hours |
| Mean time to remediate (MTTR) | Time to fix discovered issues | Not publicly disclosed |
| Attack success at 200 attempts | Multi-attempt bypass rate | Claude Opus 4.5: 0% (computer use) to 63% (coding) |
Government Framework
Section titled βGovernment FrameworkβCISA defines AI red teaming as a subset of AI Testing, Evaluation, Verification and Validation (TEVV). NIST has operationalized this through programs like Assessing Risks and Impacts of AI (ARIA) and the GenAI Challenge.
Limitations
Section titled βLimitationsβWhile external red teaming is increasingly common, critical gaps remain:
- Limited disclosure of red team findings and remediation actions
- Selective engagement: Labs choose which red teamers to work with
- Short engagement windows: 15-30 days may be insufficient for complex systems
- Post-deployment gaps: Less emphasis on continuous adversarial testing after launch
Data Quality: Some public information from lab announcements and red team providers. Comprehensive engagement rates and detailed findings remain largely non-public.
5. Dangerous Capability Disclosure Delays
Section titled β5. Dangerous Capability Disclosure DelaysβCurrent Status (2025): Significant and increasing delays, with some major model releases lacking safety documentation entirely.
Google Gemini 2.5 Pro (March 2025)
Section titled βGoogle Gemini 2.5 Pro (March 2025)βGoogle released Gemini 2.5 Proβπ webβ β β ββFortuneGoogle's Gemini 2.5 Pro missing key safety report in violation of promisesGoogle launched Gemini 2.5 Pro without publishing a required safety report, contradicting previous commitments made to government and international bodies about model transparen...Source βNotes without a model card, violating commitments made to the U.S. government and at international AI safety summits:
| Timeline | Event | Notes |
|---|---|---|
| March 2025 | Gemini 2.5 Pro released | No model card published |
| 3 weeks later | Simplified 6-page model card | Called βmeagerβ and βworrisomeββπ webβ β β ββFortuneCalled "meager" and "worrisome"Source βNotes by AI governance experts |
| Late June 2025 | Detailed report published | Months after full release |
Government Response: 60 U.K. politicians signed an open letterβπ webβ β β ββFortuneBritish lawmakers accuse Google of 'breach of trust' over delayed Gemini 2.5 Pro safety reportA group of 60 U.K. lawmakers criticized Google DeepMind for not fully disclosing safety information about its Gemini 2.5 Pro AI model as previously committed. The letter argues ...Source βNotes accusing Google DeepMind of βa troubling breach of trust with governments and the publicβ and a βfailure to honourβ international commitments.
Googleβs Defense: The company claimed Gemini 2.5 Pro was an βexperimentalβ release, exempting it from normal documentation requirements.
OpenAI Documentation Gaps
Section titled βOpenAI Documentation Gapsβ- Deep Research model: Released without a system card, which was published weeks later
- GPT-4.1: OpenAI announced it would not publish a technical safety report, arguing the model is βnot a frontier modelβ
Broader Industry Pattern
Section titled βBroader Industry PatternβMeta Llama 4: Model card was similarly brief and limited in detail, drawing criticism from AI safety researchers.
Systemic Issues: The AI Safety Index found that βAI developers control both the design and disclosure of dangerous capability evaluations, creating inherent incentives to underreport alarming results or select lenient testing conditions that avoid costly deployment delays.β
Transparency Requirements
Section titled βTransparency RequirementsβWhile voluntary commitments emphasize transparency, actual disclosure practices show significant gaps:
- Limited disclosure of evaluation methodologies
- Weak evidence of systematic safety processes
- Uneven adoption of robust evaluation practices
New Legal Requirements
Section titled βNew Legal RequirementsβCaliforniaβs Transparency in Frontier AI Act (effective 2026) establishes:
- Transparency requirements for large AI developers
- Mandatory reporting of critical safety incidents to state attorney general
- Whistleblower protections for employees reporting risks
Data Quality: Based on public monitoring by AI governance organizations, investigative journalism, and government oversight. Actual capability evaluation results remain largely proprietary.
6. Pre-Deployment Safety Testing Duration
Section titled β6. Pre-Deployment Safety Testing DurationβCurrent Status (2025): Highly variable and generally decreasing. No standardized minimum testing period exists.
Testing Approaches
Section titled βTesting ApproachesβMajor frontier AI labs follow safety policies that include pre-deployment testing:
- OpenAIβs Preparedness Frameworkβπ webβ β β β βOpenAIPreparedness FrameworkSource βNotes (Version 2, April 2025)
- Google DeepMindβs Frontier Safety Framework
- Anthropicβs Responsible Scaling Policyβπ webβ β β β βAnthropicAnthropic pioneered the Responsible Scaling PolicySource βNotes (Version 2.2, May 2025)
Third-party evaluators (UK AISI, US AISI, Apollo Research, METRβπ webβ β β β βMETRmetr.orgSource βNotes) also conduct pre-deployment assessments, though their access and time are limited. METRβs analysis of 12 companiesβπ webβ β β β βMETRMETR's analysis of 12 companiesSource βNotes with published frontier AI safety policies found variable commitment levels.
Known Testing Examples
Section titled βKnown Testing ExamplesβOpenAI o1 (December 2024): Joint US AISI and UK AISI evaluation during a βlimited periodβ before public release. Specific duration not disclosed publicly.
Safeguard Testing Benchmarks: Research examples show wide variation in time requirements:
- First vulnerability test: 10 minutes of expert red teamer time
- Second test (novel universal jailbreak): Over 7 hours of expert effort
Industry Trends
Section titled βIndustry TrendsβThe 2025 AI Safety Index concluded that:
- Pre-deployment testing is βlikely necessary but insufficientβ for responsible AI development
- Testing is conducted with βlimited time periods and finite resourcesβ
- βIf timelines are short, AI companies are unlikely to make high-assurance safety casesβ
Comparison to Other Industries
Section titled βComparison to Other IndustriesβUnlike pharmaceuticals (multi-year clinical trials) or aerospace (extensive certification processes), AI systems lack:
- Standardized testing protocols
- Minimum duration requirements
- Independent verification mandates
- Clear pass/fail criteria for deployment
Data Quality: Very limited public data. Specific testing durations are rarely disclosed. Assessment based on general industry reports and occasional third-party evaluator statements.
7. Model Release Velocity
Section titled β7. Model Release VelocityβCurrent Status (2025): Unprecedented acceleration, with major labs releasing frontier models within weeks of each other.
Release Frequency Trends
Section titled βRelease Frequency Trendsβ2024 Baseline: Major labs typically released frontier models annually or semi-annually.
2025 Acceleration: βCompanies that typically released major models annually or semi-annually were now shipping frontier models within weeks of each other.β Each release incorporated learnings from the previous weekβs competitive announcements.
November-December 2025: βTit-for-Tat Arms Raceβ
Section titled βNovember-December 2025: βTit-for-Tat Arms RaceββIn just 25 days, four major AI companies launched their most powerful modelsβπ web25 days, four major AI companies launched their most powerful modelsSource βNotes:
| Date | Company | Model | Benchmark Performance |
|---|---|---|---|
| November 17 | xAI | Grok 4.1 | Top on select reasoning tasks |
| November 18 | Gemini 3 | Topped multiple leaderboards | |
| November 24 | Anthropic | Claude Opus 4.5 | 80%+ on SWE-Bench Verified |
| December 11 | OpenAI | GPT-5.2 | Competitive across benchmarks |
This concentration represented βa compression of innovation never before seen in technology history.β OpenAIβs Sam Altman issued an internal βcode redβ memoβπ webβ β β ββTechCrunchissued an internal "code red" memoSource βNotes after Gemini 3 topped leaderboards, with internal sources reporting that some employees asked for delays but βcompetitive pressure forced the accelerated timeline.β
2025 Release Summary
Section titled β2025 Release SummaryβOpenAI:
- GPT-5 with improved coding and βthinkingβ mode
- GPT-5.1 Codex Max (agentic coding model)
- GPT-5.2
- gpt-oss-120b and gpt-oss-20b (open-source models)
- Dozens of feature launches (GPT-4o Image, standalone Sora app, group chats)
Anthropic:
- Claude 4 family (Opus and Sonnet)
- Claude Opus 4.5 (November 24)
- Claude 4.5 Haiku
Google DeepMind:
- Gemini 2.5 (March)
- Gemini 3 (November)
- Gemini 2.5 Deep Think
- Genie 3.0 (world model)
Meta & Others:
- DeepSeek R1 (January 20, 2025) - major open model impact
- Qwen 3 and various Chinese lab releases
Tracking Data
Section titled βTracking Dataβ- AI Flash Report: Tracked 43 model releases as of October 27, 2025
- Our World in Data: Tracks large-scale AI systems (>10Β²Β³ FLOP training compute)
Safety Implications
Section titled βSafety ImplicationsβRapid release velocity creates pressure that:
- Reduces time available for safety evaluation
- Encourages βshipping within weeksβ competitive dynamics
- Creates feedback loops of rapid iteration
- May prioritize βshiny productsβ over safety culture
Data Quality: Good tracking of major model releases through multiple sources. Precise internal development timelines remain proprietary.
8. Open-Source vs Closed Model Capability Gap
Section titled β8. Open-Source vs Closed Model Capability GapβCurrent Status (2025): Gap narrowing significantly, from approximately 16 months in 2024 to approximately 3 months in late 2025.
Current Gap Estimate
Section titled βCurrent Gap EstimateβEpoch AI research from October 2025βπ webβ β β β βEpoch AIEpoch AI research from October 2025Source βNotes found:
| Metric | 2024 Estimate | 2025 Estimate | Trend |
|---|---|---|---|
| Average lag time | 16 months | 3 months | Narrowing rapidly |
| ECI gap (capability index) | 15-20 points | 7 points | Narrowing |
| Benchmark parity domains | Limited | Most key benchmarks | Expanding |
| Enterprise use gap | Significant | 15% on SWE-Bench | Narrowing |
Specific Example: Metaβs Llama 3.1 405B (released July 2024) took approximately 16 months to match the capabilities of GPT-4βs first version.
Gap Narrowing Evidence
Section titled βGap Narrowing Evidenceβ2024: Ecosystem relied primarily on Llama 3, with Qwen2.5 and DeepSeek known to specialists.
2025: DeepSeek R1 (January 20, 2025)βπ web2025 Open Models Year in ReviewFlorian Brand, Substack, SubstackThe 2025 open model landscape saw dramatic capability increases, with models like DeepSeek R1 and Qwen 3 rivaling closed models across key benchmarks. Chinese and global open mo...Source βNotes achieved performance parity with OpenAIβs o1 while operating at 15x reduced cost, training for just $5.6 million. The open ecosystem βimmensely accelerated in terms of capabilities, rivaling closed models on most key benchmarks.β
Metaβs Claims
Section titled βMetaβs ClaimsβMeta described Llama 3.1 as βthe first frontier-level open source AI model,β claiming it βoutperforms GPT-4.0 and Anthropicβs Claude 3.5 on several benchmarksβ in internal evaluations.
Remaining Closed Model Advantages
Section titled βRemaining Closed Model AdvantagesβHighest-end performance: GPT-4 and newer models remain more capable on complex tasks requiring deep reasoning.
Enterprise benchmarks: On SWE-Bench Verified (real GitHub issue fixes):
- State-of-the-art closed models (GPT-5.2-Codex, Claude Opus 4.5): 80%+
- Top open-source models: 65%
- This gap is described as βcritical for enterprise useβ
Open Model Strengths
Section titled βOpen Model StrengthsβNiche verticals: Open-source models lead in biomedicine, law, and defense applications where institutional constraints (privacy, security, customization) matter more than raw performance.
Cost-effectiveness: Significantly cheaper for customization and fine-tuning.
Adoption trends: According to a16z research:
- 41% of surveyed enterprises will increase use of open-source models
- Another 41% will switch from closed to open if performance matches
Open Source Definition Debate
Section titled βOpen Source Definition DebateβMetaβs Llama models donβt meet the Open Source Initiativeβs definition, which requires sharing:
- Model weights (Meta provides this)
- Training data (Meta does not provide this)
- Training code (Meta does not provide this)
Google Internal Assessment
Section titled βGoogle Internal AssessmentβA leaked 2023 Google memo warned: βwe arenβt positioned to win this arms race β¦ Iβm talking, of course, about open source. Plainly put, they are lapping us.β
Data Quality: Based on benchmark comparisons, research organization analysis (Epoch AI), and industry reports. Benchmarks may not capture all relevant capability dimensions.
9. Lab Safety Team Turnover Rate
Section titled β9. Lab Safety Team Turnover RateβCurrent Status (2025): Specific turnover rates not publicly disclosed, but high-profile departures suggest significant retention challenges at leading labs.
OpenAI Safety Departures (2024-2025)
Section titled βOpenAI Safety Departures (2024-2025)βMay 2024 - Superalignment Team Dissolutionβπ webβ β β ββCNBCOpenAI dissolves Superalignment AI safety teamOpenAI has disbanded its Superalignment team, which was dedicated to controlling advanced AI systems. The move follows the departure of key team leaders Ilya Sutskever and Jan L...Source βNotes:
| Name | Role | Departure Date | Public Statement |
|---|---|---|---|
| Ilya Sutskever | Co-founder, Chief Scientist, Superalignment co-lead | May 14, 2024 | No public criticism |
| Jan Leike | Head of Alignment, Superalignment co-lead | May 2024 | βSafety cultureβ¦ took a backseat to shiny productsβ |
| Daniel Kokotajlo | Safety researcher | April 2024 | βResigned in protest after losing confidenceβ |
| Leopold Aschenbrenner | Safety researcher | 2024 | Reportedly fired for leaking information |
| William Saunders | Safety researcher | 2024 | No public statement |
Jan Leikeβs criticism: βOver the past few months my team has been sailing against the wind. Sometimes we were struggling for [computing resources]β despite OpenAIβs promise to allocate 20% of compute to Superalignment. He joined Anthropic.
Result: The entire Superalignment team was disbanded, with members reassigned to other teams.
September 2024 Leadership Exits:
- Mira Murati (CTO, 6 years at OpenAI)
- Bob McGrew (Chief Research Officer)
- Barret Zoph (VP of Research)
- Hannah Wong (Chief Communications Officer)
- Tom Cunningham (Economics Researcher)
- Miles Brundage (Policy Research Head)
Total documented senior departures: 25+ as of December 2024
June 2024 Open Letter
Section titled βJune 2024 Open LetterβNine current and former OpenAI employees wrote an open letter criticizing the company for βrecklessly racingβ to build AGI. Daniel Kokotaljo spoke out despite OpenAI initially conditioning his equity (worth β$1.7 million) on non-disparagement agreement compliance.
Anthropic
Section titled βAnthropicβAnthropic has positioned itself as a safety-focused alternative and received several high-profile hires from OpenAI, including Jan Leike. Specific turnover data not publicly available.
Industry-Wide Context
Section titled βIndustry-Wide ContextβGeneral corporate AI employee retention challenges in 2025:
- High demand for AI talent creates strong external offers
- Burnout from rapid development pace
- Philosophical disagreements over safety prioritization
Whistleblower Issues
Section titled βWhistleblower IssuesβThe 2025 AI Safety Index noted: βPublic whistleblowing policies are a common best practice in safety-critical industries. Yet, among the assessed companies, only OpenAI has published its full policy, and it did so only after media reports revealed the policyβs highly restrictive non-disparagement clauses.β
Data Quality: Very limited. Based on public departure announcements, investigative journalism, and open letters. Internal turnover rates for safety-specific teams are not disclosed. No denominator data (total safety team size over time) publicly available.
10. Whistleblower Reports from AI Labs
Section titled β10. Whistleblower Reports from AI LabsβCurrent Status (2025): Small but growing number of public whistleblower reports, primarily from OpenAI. Structural barriers remain significant.
βThe OpenAI Filesβ (June 2025)
Section titled ββThe OpenAI Filesβ (June 2025)βComprehensive report compiled by the Midas Project and Tech Oversight Project tracking issues with governance, leadership, and safety culture at OpenAI. Drew on:
- Legal documents and complaints
- Social media posts
- Media reports
- Open letters and insider accounts
Described as βthe most comprehensive collection to date of documented concerns with governance practices, leadership integrity, and organizational culture at OpenAI.β
Notable Whistleblower Cases
Section titled βNotable Whistleblower CasesβDaniel Kokotajlo (2024): Spoke publicly despite non-disparagement agreement that initially conditioned β$1.7 million in equity on his silence. Resigned βin protest after losing confidence in the company.β
Jan Leike (2024): While not technically a whistleblower (as he departed to Anthropic), publicly criticized OpenAI on X, stating safety βtook a back seat to shiny productsβ and the team was under-resourced.
Nine-Person Open Letter (June 2024): Current and former OpenAI employees criticized the company for βrecklessly racingβ toward AGI.
Structural Barriers to Whistleblowing
Section titled βStructural Barriers to WhistleblowingβNon-Disparagement Agreements: OpenAI initially used agreements that conditioned equity vesting on non-criticism. This practice was exposed and modified after public backlash.
Whistleblower Policy Gaps: The AI Safety Index found that only OpenAI published its full whistleblowing policy, and only after media scrutiny revealed restrictive clauses.
Industry Comparison: Most AI labs lack public whistleblower policies comparable to safety-critical industries like aviation or nuclear power.
Cross-Lab Safety Criticism
Section titled βCross-Lab Safety CriticismβIn 2025, AI safety researchers from OpenAI, Anthropic, and other organizations publicly criticized xAIβs βrecklessβ and βcompletely irresponsibleβ safety culture following company scandals.
New Protections
Section titled βNew ProtectionsβCalifornia SB 53 (supported by Anthropic): Provides whistleblower protections for employees reporting AI-related risks or safety concerns to authorities. Effective January 1, 2026.
California AI Safety Act: Establishes protections from retaliation for reporting AI-related risks.
Cross-Lab Evaluation Initiative (2025)
Section titled βCross-Lab Evaluation Initiative (2025)βIn early summer 2025, Anthropic and OpenAI agreed to evaluate each otherβs public models using in-house misalignment evaluations and released findings in parallel. While not whistleblowing per se, this represents increased transparency.
Limitations
Section titled βLimitationsβActual number of whistleblower reports remains unknown because:
- Many concerns may be raised internally without public disclosure
- Non-disparagement agreements suppress some reports
- Fear of career consequences deters whistleblowing
- No centralized reporting or tracking mechanism exists
Data Quality: Very limited. Based on public letters, media investigations, and individual whistleblower accounts. Represents visible tip of potentially larger iceberg of internal concerns.
Summary of Data Availability
Section titled βSummary of Data Availabilityβ| Metric | Data Quality | Public Availability |
|---|---|---|
| Voluntary commitment compliance | Good | Detailed 2025 study available |
| RSP threshold crossings | Poor | Companies control disclosure |
| Training-to-eval timeline | Poor | Mostly not disclosed |
| External red team engagement | Moderate | Some provider data, limited findings |
| Disclosure delays | Moderate | Tracked by watchdog organizations |
| Pre-deployment testing duration | Poor | Rarely disclosed |
| Model release velocity | Good | Well-tracked by multiple sources |
| Open vs closed capability gap | Good | Regular benchmark comparisons |
| Safety team turnover | Poor | Only high-profile departures visible |
| Whistleblower reports | Poor | Only public cases known |
Key Takeaways
Section titled βKey Takeawaysβ-
Compliance varies widely: Even leading labs struggle with certain commitments (especially model weight security and third-party reporting)
-
Evaluation timelines are shortening: Despite increasing capabilities, safety testing windows are compressed, raising concerns about thoroughness
-
Transparency gaps persist: Major model releases sometimes lack promised safety documentation, violating voluntary commitments
-
Release velocity is accelerating: Competitive pressure has created unprecedented model release density, particularly in late 2025
-
Open-source catching up: The capability gap between open and closed models is narrowing from ~16 months to potential parity in some domains
-
Safety team retention challenges: High-profile departures, particularly from OpenAIβs Superalignment team, suggest cultural or resource allocation issues
-
Limited whistleblower infrastructure: Despite safety-critical nature of AI development, formal whistleblower protections and reporting mechanisms remain underdeveloped
-
Data quality challenges: Most metrics suffer from limited disclosure, creating information asymmetry between labs and external stakeholders
Sources
Section titled βSourcesβ- Do AI Companies Make Good on Voluntary Commitments to the White House?βπ paperβ β β ββarXivDo AI Companies Make Good on Voluntary Commitments to the White House?Wang, Jennifer, Huang, Kayla, Klyman, Kevin et al. (2025)Research analyzed 16 AI companies' compliance with White House voluntary AI commitments in 2023, finding wide disparities in performance with an average score of 53% and signifi...Source βNotes - August 2025 compliance study
- Anthropicβs Responsible Scaling Policyβπ webβ β β β βAnthropicAnthropic's Responsible Scaling PolicyAnthropic introduces a systematic approach to managing AI risks by establishing AI Safety Level (ASL) Standards that dynamically adjust safety measures based on model capabiliti...Source βNotes
- Anthropicβs Responsible Scaling Policy Update Makes a Step Backwardsβπ webAnthropic's Responsible Scaling Policy Update Makes a Step BackwardsAnthropic's recent Responsible Scaling Policy update reduces specificity and concrete metrics for AI safety thresholds. The changes shift from quantitative benchmarks to more qu...Source βNotes - SaferAI analysis
- Responsible Scaling: Comparing Government Guidance and Company Policyβπ webResponsible Scaling: Comparing Government Guidance and Company PolicyThe report critiques Anthropic's Responsible Scaling Policy and recommends more rigorous risk threshold definitions and external oversight for AI safety levels.Source βNotes - Institute for AI Policy and Strategy
- 2025 AI Safety Indexβπ webβ β β ββFuture of Life InstituteFLI AI Safety Index Summer 2025The FLI AI Safety Index Summer 2025 assesses leading AI companies' safety efforts, finding widespread inadequacies in risk management and existential safety planning. Anthropic ...Source βNotes - Future of Life Institute
- AI Safety Index Winter 2025βπ webβ β β ββFuture of Life InstituteAI Safety Index Winter 2025The Future of Life Institute assessed eight AI companies on 35 safety indicators, revealing substantial gaps in risk management and existential safety practices. Top performers ...Source βNotes
- AI industry timelines to AGI getting shorter, but safety becoming less of a focusβπ webβ β β ββFortuneAI industry timelines to AGI getting shorter, but safety becoming less of a focusJeremy KahnLeading AI researchers predict AGI could arrive by 2027-2030, but companies are simultaneously reducing safety testing and evaluations. Competitive pressures are compromising re...Source βNotes
- OpenAI: Red Teaming GPT-4o, Operator, o3-mini, and Deep Researchβπ webOpenAI: Red Teaming GPT-4o, Operator, o3-mini, and Deep ResearchOpenAI employed external red team testing to systematically evaluate safety vulnerabilities in GPT-4o, Operator, o3-mini, and Deep Research models. The testing targeted alignmen...Source βNotes - ControlPlane
- Advancing red teaming with people and AIβπ webβ β β β βOpenAIAdvancing red teaming with people and AIOpenAI explores external and automated red teaming approaches to systematically test AI model safety and potential risks. The research focuses on developing more diverse and eff...Source βNotes - OpenAI
- AI Red Teaming | Offensive Testing for AI Modelsβπ webAI Red Teaming | Offensive Testing for AI ModelsHackerOne offers AI red teaming services that use expert researchers to identify security risks, jailbreaks, and misalignments in AI models through targeted testing. The service...Source βNotes - HackerOne
- AI Red Teaming: Applying Software TEVV for AI EvaluationsβποΈ governmentβ β β β βCISAAI Red Teaming: Applying Software TEVV for AI EvaluationsI apologize, but the provided text does not appear to be a substantive document about AI red teaming. Instead, it seems to be a collection of blog post titles related to cyberse...Source βNotes - CISA
- AI models can be dangerous before public deploymentβπ webβ β β β βMETRAI models can be dangerous before public deploymentThe article argues that current AI safety frameworks focused solely on pre-deployment testing are inadequate, as internal AI model usage and development can pose significant ris...Source βNotes - METR
- Pre-Deployment Evaluation of OpenAIβs o1 ModelβποΈ governmentβ β β β β NISTPre-Deployment Evaluation of OpenAI's o1 ModelJoint evaluation by US and UK AI Safety Institutes tested OpenAI's o1 model across three domains, comparing its performance to reference models and assessing potential capabilit...Source βNotes - NIST
- Pre-Deployment evaluation of OpenAIβs o1 modelβποΈ governmentβ β β β βUK AI Safety InstitutePre-Deployment evaluation of OpenAI's o1 modelA comprehensive safety assessment of OpenAI's o1 model by US and UK AI Safety Institutes, testing capabilities across cyber, biological, and software development domains. The ev...Source βNotes - UK AISI
- Googleβs Gemini 2.5 Pro missing key safety report in violation of promisesβπ webβ β β ββFortuneGoogle's Gemini 2.5 Pro missing key safety report in violation of promisesGoogle launched Gemini 2.5 Pro without publishing a required safety report, contradicting previous commitments made to government and international bodies about model transparen...Source βNotes
- British lawmakers accuse Google of βbreach of trustβ over delayed Gemini 2.5 Pro safety reportβπ webβ β β ββFortuneBritish lawmakers accuse Google of 'breach of trust' over delayed Gemini 2.5 Pro safety reportA group of 60 U.K. lawmakers criticized Google DeepMind for not fully disclosing safety information about its Gemini 2.5 Pro AI model as previously committed. The letter argues ...Source βNotes
- Google is shipping Gemini models faster than its AI safety reportsβπ webβ β β ββTechCrunchGoogle is shipping Gemini models faster than its AI safety reportsGoogle is accelerating its AI model releases, including Gemini 2.5 Pro and 2.0 Flash, but has not published required safety documentation. This raises concerns about transparenc...Source βNotes
- 2025 Open Models Year in Reviewβπ web2025 Open Models Year in ReviewFlorian Brand, Substack, SubstackThe 2025 open model landscape saw dramatic capability increases, with models like DeepSeek R1 and Qwen 3 rivaling closed models across key benchmarks. Chinese and global open mo...Source βNotes
- AI Model Release Timelineβπ webAI Model Release TimelineAI Flash ReportA detailed chronological record of AI model releases from various companies, documenting their specifications, performance metrics, and key capabilities. Covers language models,...Source βNotes - AI Flash Report
- Timeline of AI model releases in 2024βπ webTimeline of AI model releases in 2024VentureBeat created a detailed tracking of significant AI model releases in 2024, using data from the Artificial Intelligence Timeline project. The timeline covers both API and ...Source βNotes
- 2025 LLM Year in Reviewβπ web2025 LLM Year in ReviewA review of 2025's LLM developments highlighting key paradigm shifts including Reinforcement Learning from Verifiable Rewards (RLVR), novel AI interaction models, and emerging A...Source βNotes - Andrej Karpathy
- AI Models Comparison 2025: Claude, Grok, GPT & Moreβπ webAI Models Comparison 2025: Claude, Grok, GPT & MoreThe 2025 AI landscape features six prominent model families with specialized capabilities, including Claude 4's coding prowess, Grok 3's reasoning, and emerging trends in multim...Source βNotes
- The Gap Between Open and Closed AI Models Might Be Shrinkingβπ webβ β β ββTIMEThe Gap Between Open and Closed AI Models Might Be ShrinkingEpoch AI research reveals that open AI models are approximately one year behind closed models in capabilities, with the gap potentially shrinking as open models advance.Source βNotes - Time, Epoch AI report
- Open vs. Closed LLMs in 2025: Strategic Tradeoffs for Enterprise AIββοΈ blogβ β βββMediumOpen vs. Closed LLMs in 2025: Strategic Tradeoffs for Enterprise AIThe landscape of large language models in 2025 is characterized by a nuanced approach to model selection, moving beyond binary open vs. closed debates. Organizations are increas...Source βNotes
- OpenAIβs recent departures force leaders to reaffirm safety commitmentβπ webOpenAI's recent departures force leaders to reaffirm safety commitmentSource βNotes - Axios
- Top OpenAI researcher resigns, saying company prioritized βshiny productsβ over AI safetyβπ webβ β β ββFortuneTop OpenAI researcher resigns, saying company prioritized 'shiny products' over AI safetyJan Leike resigned from OpenAI, citing concerns about the company's commitment to AI safety. His departure follows that of co-lead Ilya Sutskever, highlighting tensions within t...Source βNotes
- OpenAI dissolves Superalignment AI safety teamβπ webβ β β ββCNBCOpenAI dissolves Superalignment AI safety teamOpenAI has disbanded its Superalignment team, which was dedicated to controlling advanced AI systems. The move follows the departure of key team leaders Ilya Sutskever and Jan L...Source βNotes - CNBC
- The OpenAI Safety Exodus: 25+ Senior Researchers Departedβπ webThe OpenAI Safety Exodus: 25+ Senior Researchers DepartedOver 25 senior OpenAI researchers have departed, including key leadership in AI safety roles. The departures suggest a potential strategic realignment away from careful AI safet...Source βNotes
- βThe OpenAI Filesβ reveals deep leadership concerns about Sam Altman and safety failuresβπ webβ β β ββFortune"The OpenAI Files" reveals deep leadership concerns about Sam Altman and safety failuresBeatrice NolanThe 'OpenAI Files' examines internal issues at OpenAI, highlighting leadership challenges and potential risks in AI development. The report critiques Sam Altman's leadership and...Source βNotes
- The Fight to Protect AI Whistleblowersβπ webThe Fight to Protect AI WhistleblowersThe provided text appears to be a collection of labor law and union-related news articles with no coherent focus on AI whistleblowers.Source βNotes
- Anthropic: Compliance framework for California SB 53βπ webβ β β β βAnthropicAnthropic: Compliance framework for California SB 53Anthropic outlines its Frontier Compliance Framework (FCF) in response to California's Transparency in Frontier AI Act, detailing approaches to assess and mitigate potential cat...Source βNotes