Solution Cruxes
- Quant.METR's analysis shows AI agent task-completion capability doubled every 7 months over 6 years; extrapolating predicts 5-year timeline when AI independently completes software tasks taking humans weeks.S:4.0I:5.0A:3.5
- Quant.C2PA provenance adoption shows <1% user verification rate despite major tech backing (Adobe, Microsoft), while detection accuracy declining but remains 85-95%βdetection more near-term viable despite theoretical disadvantages.S:4.0I:4.0A:4.5
- DebateOnly 25-40% of experts believe AI-based verification can match generation capability; 60-75% expect verification to lag indefinitely, suggesting verification R&D may yield limited returns without alternative approaches like provenance.S:4.0I:4.5A:4.0
Overview
Section titled βOverviewβSolution cruxes are the key uncertainties that determine which interventions we should prioritize in AI safety and governance. Unlike risk cruxes that focus on the nature and magnitude of threats, solution cruxes examine the tractability and effectiveness of different approaches to addressing those threats. Your position on these cruxes should fundamentally shape what you work on, fund, or advocate for.
The landscape of AI safety solutions spans three critical domains: technical approaches that use AI systems themselves to verify and authenticate content; coordination mechanisms that align incentives across labs, nations, and institutions; and infrastructure investments that create sustainable epistemic institutions. Within each domain, fundamental uncertainties about feasibility, cost-effectiveness, and adoption timelines create genuine disagreements among experts about optimal resource allocation.
These disagreements have enormous practical implications. Whether AI-based verification can keep pace with AI-based generation determines if we should invest billions in detection infrastructure or pivot to provenance-based approaches. Whether frontier AI labs can coordinate without regulatory compulsion shapes the balance between industry engagement and government intervention. Whether credible commitment mechanisms can be designed determines if international AI governance is achievable or if we should prepare for an uncoordinated race.
Risk Assessment
Section titled βRisk Assessmentβ| Risk Category | Severity | Likelihood | Timeline | Trend |
|---|---|---|---|---|
| Verification-generation arms race | High | 70% | 2-3 years | Accelerating |
| Coordination failure under pressure | Critical | 60% | 1-2 years | Worsening |
| Epistemic infrastructure collapse | High | 40% | 3-5 years | Stable |
| International governance breakdown | Critical | 55% | 2-4 years | Worsening |
Solution Effectiveness Overview
Section titled βSolution Effectiveness OverviewβThe 2025 AI Safety Indexβπ webβ β β ββFuture of Life InstituteAI Safety Index Winter 2025The Future of Life Institute assessed eight AI companies on 35 safety indicators, revealing substantial gaps in risk management and existential safety practices. Top performers ...Source βNotes from the Future of Life Institute and the International AI Safety Report 2025βπ webInternational AI Safety Report 2025The International AI Safety Report 2025 provides a global scientific assessment of general-purpose AI capabilities, risks, and potential management techniques. It represents a c...Source βNotes---compiled by 96 AI experts representing 30 countries---provide sobering assessments of current solution effectiveness. Despite growing investment, core challenges including alignment, control, interpretability, and robustness remain unresolved, with system complexity growing year by year. The following table summarizes effectiveness estimates across major solution categories based on 2024-2025 assessments.
| Solution Category | Estimated Effectiveness | Investment Level (2024) | Maturity | Key Gaps |
|---|---|---|---|---|
| Technical alignment research | Moderate (35-50%) | $500M-1B | Early research | Scalability, verification |
| InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 | Promising (40-55%) | $100-200M | Active research | Superposition, automation |
| Responsible Scaling PoliciesPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 | Limited (25-35%) | N/A (policy) | Deployed but weak | Vague thresholds, compliance |
| Third-party evaluations (METRβπ webβ β β β βMETRmetr.orgSource βNotes) | Moderate (45-55%) | $10-20M | Operational | Coverage, standardization |
| Compute governance | Theoretical (20-30%) | $5-10M | Early research | Verification mechanisms |
| International coordination | Very limited (15-25%) | $50-100M | Nascent | US-China competition |
According to Anthropicβs recommended research directionsβπ webβ β β β βAnthropic AlignmentAnthropic: Recommended Directions for AI Safety ResearchAnthropic proposes a range of technical research directions for mitigating risks from advanced AI systems. The recommendations cover capabilities evaluation, model cognition, AI...Source βNotes, the main reason current AI systems do not pose catastrophic risks is that they lack many of the capabilities necessary for causing catastrophic harm---not because alignment solutions have been proven effective. This distinction is crucial for understanding the urgency of solution development.
Solution Prioritization Framework
Section titled βSolution Prioritization FrameworkβThe following diagram illustrates the decision tree for prioritizing AI safety solutions based on key crux resolutions:
Technical Solution Cruxes
Section titled βTechnical Solution CruxesβThe technical domain centers on whether AI systems can be effectively turned against themselvesβusing artificial intelligence to verify, detect, and authenticate AI-generated content. This offensive-defensive dynamics question has profound implications for billions of dollars in research investment and infrastructure development.
Current Technical Landscape
Section titled βCurrent Technical Landscapeβ| Approach | Investment Level | Success Rate | Commercial Deployment | Key Players |
|---|---|---|---|---|
| AI Detection | $100M+ annually | 85-95% (academic) | Limited | OpenAIβπ webβ β β β βOpenAIOpenAISource βNotes, Originality.aiβπ webOriginality.aiSource βNotes |
| Content Provenance | $50M+ annually | N/A (adoption metric) | Early stage | Adobeβπ webAdobeSource βNotes, Microsoftβπ webβ β β β βMicrosoftMicrosoftSource βNotes |
| Watermarking | $25M+ annually | Variable | Pilot programs | Google DeepMindβπ webβ β β β βGoogle DeepMindGoogle SynthIDSynthID embeds imperceptible watermarks in AI-generated content to help identify synthetic media without degrading quality. It works across images, audio, and text platforms.Source βNotes |
| Verification Systems | $75M+ annually | Context-dependent | Research phase | DARPAβπ webDARPA SemaForSemaFor focuses on creating advanced detection technologies that go beyond statistical methods to identify semantic inconsistencies in deepfakes and AI-generated media. The prog...Source βNotes |
Can AI-based verification scale to match AI-based generation?
Whether AI systems designed for verification (fact-checking, detection, authentication) can keep pace with AI systems designed for generation.
Key Positions
Would Update On
- β’Breakthrough in generalizable detection
- β’Real-world deployment data on AI verification performance
- β’Theoretical analysis of offense-defense balance
- β’Economic analysis of verification costs vs generation costs
The current evidence presents a mixed picture. DARPAβs SemaFor programβπ webDARPA SemaForSemaFor focuses on creating advanced detection technologies that go beyond statistical methods to identify semantic inconsistencies in deepfakes and AI-generated media. The prog...Source βNotes, launched in 2021 with $26 million in funding, has demonstrated some success in semantic forensics for manipulated media, but primarily on specific types of synthetic content rather than the broad spectrum of AI-generated material now emerging. Meanwhile, commercial detection tools like GPTZeroβπ webGPTZeroSource βNotes report accuracy rates of 85-95% on academic writing, but these drop significantly when generators are specifically designed to evade detection.
The fundamental challenge lies in the asymmetric nature of the problem. Content generators need only produce plausible outputs, while detectors must distinguish between authentic and synthetic content across all possible generation techniques. This asymmetry may prove insurmountable, particularly as generation models become more sophisticated and numerous through capabilities scaling.
However, optimists point to potential advantages for verification systems: they can be specialized for detection tasks, leverage multiple modalities simultaneously, and benefit from centralized training on comprehensive datasets of known synthetic content. The emergence of foundation models specifically designed for verification, such as those being developed at Anthropicβπ paperβ β β β βAnthropicAnthropic's Work on AI SafetyAnthropic conducts research across multiple domains including AI alignment, interpretability, and societal impacts to develop safer and more responsible AI technologies. Their w...Source βNotes and OpenAIβπ paperβ β β β βOpenAIOpenAI: Model BehaviorSource βNotes, suggests this approach may have untapped potential.
Should we prioritize content provenance or detection?
Whether resources should go to proving what's authentic (provenance) vs detecting what's fake (detection).
Key Positions
Would Update On
- β’C2PA adoption metrics
- β’Detection accuracy trends
- β’User behavior research on credential checking
- β’Cost comparison of approaches
The Coalition for Content Provenance and Authenticity (C2PA)βπ webC2PA Explainer VideosThe Coalition for Content Provenance and Authenticity (C2PA) offers a technical standard that acts like a 'nutrition label' for digital content, tracking its origin and edit his...Source βNotes, backed by Adobe, Microsoft, Intel, and BBC, has gained significant momentum since 2021, with over 50 member organizations and initial implementations in Adobe Creative Cloud and Microsoft products. The provenance approach embeds cryptographic metadata proving contentβs origin and modification history, creating an βimmune systemβ for authentic content rather than trying to identify synthetic material.
Provenance vs Detection Comparison
Section titled βProvenance vs Detection Comparisonβ| Factor | Provenance | Detection |
|---|---|---|
| Accuracy | 100% for supported content | 85-95% (declining) |
| Coverage | Only new, participating content | All content types |
| Adoption Rate | <1% user verification | Universal deployment |
| Cost | High infrastructure | Moderate computational |
| Adversarial Robustness | High (cryptographic) | Low (adversarial ML) |
| Legacy Content | No coverage | Full coverage |
However, provenance faces substantial adoption challenges. Early data from C2PA implementations shows less than 1% of users actively check provenance credentials, and the system requires widespread adoption across platforms and devices to be effective. The approach also cannot address legacy content or situations where authentic content is captured without provenance systems. Detection remains necessary for the vast majority of existing content and will likely be required for years even if provenance adoption succeeds.
Can AI watermarks be made robust against removal?
Whether watermarks embedded in AI-generated content can resist adversarial removal attempts.
Key Positions
Would Update On
- β’Adversarial testing of production watermarks
- β’Theoretical bounds on watermark robustness
- β’Real-world watermark survival data
Google DeepMindβs SynthIDβπ webβ β β β βGoogle DeepMindGoogle SynthIDSynthID embeds imperceptible watermarks in AI-generated content to help identify synthetic media without degrading quality. It works across images, audio, and text platforms.Source βNotes, launched in August 2023, represents the most advanced publicly available watermarking system, using statistical patterns imperceptible to humans but detectable by specialized algorithms. However, academic research consistently demonstrates that current watermarking approaches can be defeated through various attack vectors including adversarial perturbations, model fine-tuning, and regeneration techniques.
Research by UC Berkeleyβπ paperβ β β ββarXivUC BerkeleyDavid Katona (2023)Source βNotes and University of Marylandβπ paperβ β β ββarXivUniversity of MarylandSeyed Mahed Mousavi, Simone Caldarella, Giuseppe Riccardi (2023)Source βNotes has shown that sophisticated attackers can remove watermarks with success rates exceeding 90% while preserving content quality. The theoretical foundations suggest fundamental limits to watermark robustness---any watermark that preserves content quality enough to be usable can potentially be removed by sufficiently sophisticated adversaries.
Technical Alignment Research Progress (2024-2025)
Section titled βTechnical Alignment Research Progress (2024-2025)βRecent advances in mechanistic interpretabilityβπ paperβ β β ββarXivSparse AutoencodersLeonard Bereska, Efstratios Gavves (2024)Source βNotes have demonstrated promising safety applications. Using attribution graphs, Anthropic researchers directly examined Claude 3.5 Haikuβs internal reasoning processes, revealing hidden mechanisms beyond what the model displays in its chain-of-thought. As of March 2025, circuit tracing allows researchers to observe model reasoning, uncovering a shared conceptual space where reasoning happens before being translated into language.
| Alignment Approach | 2024-2025 Progress | Effectiveness Estimate | Key Challenges |
|---|---|---|---|
| Deliberative alignment | Extended thinking in Claude 3.7, o1-preview | 40-55% risk reduction | Latency, energy costs |
| Layered safety interventions | OpenAI redundancy approach | 30-45% risk reduction | Coordination complexity |
| Sparse autoencoders (SAEs) | Scaled to Claude 3 Sonnet | 35-50% interpretability gain | Superposition, polysemanticity |
| Circuit tracing | Direct observation of reasoning | Research phase | Automation, scaling |
| Adversarial techniques (debate) | Prover-verifier games | 25-40% oversight improvement | Equilibrium identification |
The [36fb43e4e059f0c9] notes that increasing reasoning depth can raise latency and energy consumption, posing challenges for real-time applications. Scaling alignment mechanisms to future, larger models or eventual AGI systems remains an open research question, with complexity growing exponentially with model size and task diversity.
Coordination Solution Cruxes
Section titled βCoordination Solution CruxesβCoordination cruxes address whether different actorsβfrom AI labs to nation-statesβcan align their behavior around safety measures without sacrificing competitive advantages or national interests. These questions determine the feasibility of governance approaches ranging from industry self-regulation to international treaties.
Current Coordination Landscape
Section titled βCurrent Coordination Landscapeβ| Mechanism | Participants | Binding Nature | Track Record | Key Challenges |
|---|---|---|---|---|
| RSPsPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 | 4 major labs | Voluntary | Mixed compliance | Vague standards, competitive pressure |
| AI Safety InstituteβποΈ governmentβ β β β βUK AI Safety InstituteAI Safety InstituteSource βNotes networks | 8+ countries | Non-binding | Early stage | Limited authority, funding |
| Export controls | US + allies | Legal | Partially effective | Circumvention, coordination gaps |
| Voluntary commitments | Major labs | Self-enforced | Poor | No external verification |
Can frontier AI labs meaningfully coordinate on safety?
Whether labs competing for AI supremacy can coordinate on safety measures without regulatory compulsion.
Key Positions
Would Update On
- β’Labs defecting from voluntary commitments
- β’Successful regulatory enforcement
- β’Evidence of coordination changing lab behavior
The emergence of Responsible Scaling Policies (RSPs)PolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 in 2023-2024, adopted by AnthropicLabAnthropicComprehensive profile of Anthropic tracking its rapid commercial growth (from $1B to $7B annualized revenue in 2025, 42% enterprise coding market share) alongside safety research (Constitutional AI...Quality: 51/100, OpenAILabOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to commercial AGI developer, with detailed analysis of governance crisis, safety researcher exodus (75% of ...Quality: 46/100, and Google DeepMindLabGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100, represents the most significant attempt at voluntary lab coordination to date. These policies outline safety evaluations and deployment standards that labs commit to follow as their models become more capable.
However, early implementation has revealed significant limitations: evaluation standards remain vague, triggering thresholds are subjective, and competitive pressures create incentives to interpret requirements leniently. Analysis by METRLab ResearchMETRMETR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replication, cybersecurity, CBRN, and manipulation capabi...Quality: 66/100 and ARC EvaluationsOrganizationARCComprehensive overview of ARC's dual structure (theory research on Eliciting Latent Knowledge problem and systematic dangerous capability evaluations of frontier AI models), documenting their high ...Quality: 43/100 shows substantial variations in how labs implement similar commitments.
Third-Party Evaluation Effectiveness
Section titled βThird-Party Evaluation EffectivenessβMETRβπ webβ β β β βMETRmetr.orgSource βNotes (formerly ARC Evals) has emerged as the leading third-party evaluator of frontier AI systems, conducting pre-deployment evaluations of GPT-4, Claude 2, and Claude 3.5 Sonnet. Their April 2025 evaluation of OpenAIβs o3 and o4-mini found these models displayed higher autonomous capabilities than other public models tested, with o3 appearing somewhat prone to βreward hacking.β METRβs evaluation of Claude 3.7 Sonnet found impressive AI R&D capabilities on RE-Bench, though no significant evidence for dangerous autonomous capabilities.
| Evaluation Organization | Models Evaluated (2024-2025) | Key Findings | Limitations |
|---|---|---|---|
| METRβπ webβ β β β βMETRmetr.orgSource βNotes | GPT-4, Claude 2/3.5/3.7, o3/o4-mini | Autonomous capability increases; reward hacking in o3 | Limited to cooperative labs |
| UK AI Safety InstituteβποΈ governmentβ β β β βUK AI Safety InstituteAI Safety InstituteSource βNotes | Pre-deployment evals for major labs | Advanced AI evaluation frameworks | Resource constraints |
| Internal lab evaluations | All frontier models | Proprietary capabilities assessments | Conflict of interest |
METR proposes measuring AI performance in terms of the length of tasks AI agents can complete, showing this metric has been exponentially increasing over the past 6 years with a doubling time of around 7 months. Extrapolating this trend predicts that within five years, AI agents may independently complete a large fraction of software tasks that currently take humans days or weeks.
RSP Compliance Analysis (2024-2025)
Section titled βRSP Compliance Analysis (2024-2025)βAnthropicβs October 2024 RSP updateβπ webβ β β β βAnthropicAnthropic pioneered the Responsible Scaling PolicySource βNotes introduced more flexible approaches but drew criticism from external analysts. According to SaferAIβπ webSaferAI has arguedSource βNotes, Anthropicβs grade dropped from 2.2 to 1.9, placing them alongside OpenAI and DeepMind in the βweakβ category. The primary issue lies in the shift away from precisely defined capability thresholds and mitigation measures. Anthropic acknowledged falling short in some areas, including completing evaluations 3 days late, though these instances posed minimal safety risk.
| RSP Element | Anthropic | OpenAI | Google DeepMind |
|---|---|---|---|
| Capability thresholds | ASL levels (loosened) | Preparedness framework | Frontier Safety Framework |
| Evaluation frequency | 6 months (extended from 3) | Ongoing | Pre-deployment |
| Third-party review | Annual procedural | Limited | Limited |
| Public transparency | Partial | Limited | Limited |
| Binding enforcement | Self-enforced | Self-enforced | Self-enforced |
Historical Coordination Precedents
Section titled βHistorical Coordination Precedentsβ| Industry | Coordination Success | Key Factors | AI Relevance |
|---|---|---|---|
| Nuclear weapons | Partial (NPT, arms control) | Mutual destruction, verification | High stakes, but clearer parameters |
| Pharmaceuticals | Mixed (safety standards vs. pricing) | Regulatory oversight, liability | Similar R&D competition |
| Semiconductors | Successful (SEMATECH) | Government support, shared costs | Technical collaboration model |
| Social media | Poor (content moderation) | Light regulation, network effects | Platform competition dynamics |
Historical precedent suggests mixed prospects for voluntary coordination in high-stakes competitive environments. The semiconductor industryβs successful coordination on safety standards through SEMATECH offers some optimism, but occurred under different competitive dynamics and with explicit government support. The pharmaceutical industryβs mixed recordβwith some successful self-regulation but also notable failures requiring regulatory interventionβmay be more analogous to AI development.
Can US-China coordination on AI governance succeed?
Whether the major AI powers can coordinate despite geopolitical competition.
Key Positions
Would Update On
- β’US-China AI discussions outcomes
- β’Coordination on specific risks (bio, nuclear)
- β’Changes in geopolitical relationship
- β’Success/failure of UK/Korea AI summits on coordination
Current US-China AI relations are characterized by strategic competition rather than cooperation. Export controls on semiconductors, restrictions on Chinese AI companies, and national security framings dominate the policy landscape. The CHIPS ActβποΈ governmentβ β β β βWhite HouseCHIPS ActSource βNotes and export restrictions target Chinese AI development directly, while Chinaβs response includes increased domestic investment and alternative supply chains.
However, some limited dialogue continues through academic conferences, multilateral forums like the G20, and informal diplomatic channels. The UK AI Safety InstituteOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100 and Seoul DeclarationPolicySeoul Declaration on AI SafetyThe May 2024 Seoul AI Safety Summit achieved voluntary commitments from 16 frontier AI companies (80% of development capacity) and established an 11-nation AI Safety Institute network, with 75% com...Quality: 60/100 provide potential multilateral venues for engagement.
International Coordination Prospects by Risk Area
Section titled βInternational Coordination Prospects by Risk Areaβ| Risk Category | US-China Cooperation Likelihood | Key Barriers | Potential Mechanisms |
|---|---|---|---|
| AI-enabled bioweapons | 60-70% | Technical verification | Joint research restrictions |
| Nuclear command systems | 50-60% | Classification concerns | Backchannel protocols |
| Autonomous weapons | 30-40% | Military applications | Geneva Convention framework |
| Economic competition | 10-20% | Zero-sum framing | Very limited prospects |
The most promising path may involve narrow cooperation on specific risks where interests clearly align, such as preventing AI-enabled bioweapons or nuclear command-and-control accidents. The precedent of nuclear arms control offers both hope and cautionβthe US and Soviet Union managed meaningful arms control despite existential competition, but nuclear weapons had clearer technical parameters than AI risks.
Can credible AI governance commitments be designed?
Whether commitment mechanisms (RSPs, treaties, escrow) can be designed that actors can't easily defect from.
Key Positions
Would Update On
- β’Track record of RSPs and similar commitments
- β’Progress on compute governance/monitoring
- β’Examples of commitment enforcement
- β’Game-theoretic analysis of commitment mechanisms
The emerging field of compute governance offers the most promising avenue for credible commitment mechanisms. Unlike software or model parameters, computational resources are physical and potentially observable. Research by GovAILab ResearchGovAIGovAI is an AI policy research organization with ~15-20 staff, funded primarily by Coefficient Giving ($1.8M+ in 2023-2024), that has trained 100+ governance researchers through fellowships and cur...Quality: 43/100 has outlined monitoring systems that could track large-scale training runs, creating verifiable bounds on certain types of AI development.
However, the feasibility of comprehensive compute monitoring remains unclear. Cloud computing, distributed training, and algorithm efficiency improvements create multiple pathways for evading monitoring systems. International variation in monitoring capabilities and willingness could create safe havens for actors seeking to avoid commitments.
Compute Governance Verification Mechanisms
Section titled βCompute Governance Verification Mechanismsβ[482b71342542a659] identifies three primary mechanisms for using compute as a governance lever: tracking/monitoring compute to gain visibility into AI development; subsidizing or limiting access to shape resource allocation; and building βguardrailsβ into hardware to enforce rules. The AI governance platform market is projected to grow from $227 million in 2024 to $4.83 billion by 2034, driven by generative AI adoption and regulations like the EU AI Act.
| Verification Mechanism | Feasibility | Current Status | Key Barriers |
|---|---|---|---|
| Training run reporting | High | Partial implementation | Voluntary compliance |
| Chip-hour tracking | Medium | Compute providers use for billing | International coordination |
| Flexible Hardware-Enabled Guarantees (FlexHEG) | Low-Medium | Research phase | Technical complexity |
| Workload classification (zero-knowledge) | Low | Theoretical | Privacy concerns, adversarial evasion |
| Data center monitoring | Medium | Limited | Jurisdiction gaps |
According to the Institute for Law & AIβπ webEU AI ActSource βNotes, meaningful enforcement requires regulators to be aware of or able to verify the amount of compute being used. A regulatory threshold will be ineffective if regulators have no way of knowing whether a threshold has been reached. Research on [d6ad3bb2bd9d729b] proposes mechanisms to verify that data centers are not conducting large AI training runs exceeding agreed-upon thresholds.
International Governance Coordination Status
Section titled βInternational Governance Coordination StatusβThe [e11a50f25b1a20df] submitted seven recommendations in August 2024: launching a twice-yearly intergovernmental dialogue; creating an independent international scientific panel; an AI standards exchange; a capacity development network; a global fund for AI; a global AI data framework; and a dedicated AI office within the UN Secretariat. However, academic analysisβπ webOxford International AffairsSource βNotes concludes that a governance deficit remains due to inadequacy of existing initiatives, gaps in the landscape, and difficulties reaching agreement over more appropriate mechanisms.
| Governance Initiative | Participants | Binding Status | Effectiveness Assessment |
|---|---|---|---|
| AI Safety SummitsPolicyInternational AI Safety Summit SeriesThree international AI safety summits (2023-2025) achieved first formal recognition of catastrophic AI risks from 28+ countries, established 10+ AI Safety Institutes with $100-400M combined budgets...Quality: 63/100 | 28+ countries | Non-binding | Limited (pageantry vs progress) |
| EU AI Act | EU members | Binding | Moderate (implementation pending) |
| US Executive OrderPolicyExecutive Order on Safe, Secure, and Trustworthy AIExecutive Order 14110 (Oct 2023) established compute thresholds (10^26 FLOP general, 10^23 biological) and created AISI, but was revoked after 15 months with ~85% completion. The 10^26 threshold wa...Quality: 91/100 | US federal | Executive (rescindable) | Limited (political uncertainty) |
| UN HLAB recommendations | UN members | Non-binding | Minimal (no implementation) |
| Bilateral US-China dialogues | US, China | Ad hoc | Very limited (competition dominant) |
Collective Intelligence and Infrastructure Cruxes
Section titled βCollective Intelligence and Infrastructure CruxesβThe final domain addresses whether we can build sustainable systems for truth, knowledge, and collective decision-making that can withstand both market pressures and technological disruption. These questions determine the viability of epistemic institutions as a foundation for AI governance.
Current Epistemic Infrastructure
Section titled βCurrent Epistemic Infrastructureβ| Platform/System | Annual Budget | User Base | Accuracy Rate | Sustainability Model |
|---|---|---|---|---|
| Wikipedia | $150M | 1.7B monthly | 90%+ (citations) | Donations |
| Fact-checking orgs | $50M total | 100M+ reach | 85-95% | Mixed funding |
| Academic peer review | $5B+ (estimated) | Research community | Variable | Institution-funded |
| Prediction marketsInterventionPrediction MarketsPrediction markets achieve Brier scores of 0.16-0.24 (15-25% better than polls) by aggregating dispersed information through financial incentives, with platforms handling $1-3B annually. For AI saf...Quality: 56/100 | $100M+ volume | <1M active | 75-85% | Commercial |
Can AI + human forecasting substantially outperform either alone?
Whether combining AI forecasting with human judgment produces significantly better predictions than either approach separately.
Key Positions
Would Update On
- β’Systematic comparison studies
- β’Metaculus AI forecasting results
- β’Domain-specific performance data
Metaculusβπ webβ β β ββMetaculusMetaculusMetaculus is an online forecasting platform that allows users to predict future events and trends across areas like AI, biosecurity, and climate change. It provides probabilisti...Source βNotes has been conducting systematic experiments with AI forecastingInterventionAI-Augmented ForecastingAI-augmented forecasting combines AI computational strengths with human judgment, achieving 5-15% Brier score improvements and 50-200x cost reductions compared to human-only forecasting. However, A...Quality: 54/100 since 2023, with early results suggesting that AI systems can match or exceed human forecasters on certain types of questions, particularly those involving quantitative trends or pattern recognition from large datasets. However, humans continue to outperform on questions requiring contextual judgment, novel reasoning, or understanding of political and social dynamics.
AI vs Human Forecasting Performance
Section titled βAI vs Human Forecasting Performanceβ| Question Type | AI Performance | Human Performance | Combination Performance |
|---|---|---|---|
| Quantitative trends | 85-90% accuracy | 75-80% accuracy | 88-93% accuracy |
| Geopolitical events | 60-70% accuracy | 75-85% accuracy | 78-88% accuracy |
| Scientific breakthroughs | 70-75% accuracy | 80-85% accuracy | 83-88% accuracy |
| Economic indicators | 80-85% accuracy | 70-75% accuracy | 83-87% accuracy |
The combination approaches show promise but remain under-tested. Initial experiments suggest that human forecasters can improve their performance by consulting AI predictions, while AI systems benefit from human-provided context and reasoning. However, the optimal architectures for human-AI collaboration remain unclear, and the cost-effectiveness compared to scaling either approach independently has not been established.
Can epistemic infrastructure be funded as a public good?
Whether verification, fact-checking, and knowledge infrastructure can achieve sustainable funding without commercial incentives.
Key Positions
Would Update On
- β’Government investment in epistemic infrastructure
- β’Successful commercial models for verification
- β’Philanthropic commitment levels
- β’Platform willingness to pay for verification
Current epistemic infrastructure suffers from chronic underfunding relative to content generation systems. Fact-checking organizations operate on annual budgets of millions while misinformation spreads through platforms with budgets in the billions. Wikipedia, one of the most successful epistemic public goods, operates on approximately $150 million annually while supporting hundreds of millions of usersβa funding ratio of roughly $0.09 per monthly active user.
Funding Landscape for Epistemic Infrastructure
Section titled βFunding Landscape for Epistemic Infrastructureβ| Source | Annual Contribution | Sustainability | Scalability |
|---|---|---|---|
| Government | $200M+ (EU DSA, others) | Political dependent | High potential |
| Philanthropy | $100M+ (Omidyar, others) | Mission-driven | Medium potential |
| Platform fees | $50M+ (voluntary) | Unreliable | Low potential |
| Commercial models | $25M+ (fact-check APIs) | Market-dependent | High potential |
Government funding varies dramatically by jurisdiction. The EUβs Digital Services Actβπ webβ β β β βEuropean UnionEU Digital Services ActSource βNotes includes provisions for funding fact-checking and verification systems, while the US has been more reluctant to fund what could be perceived as content moderation. Philanthropic support, led by foundations like Omidyar Networkβπ webOmidyar NetworkSource βNotes and Craig Newmark Philanthropiesβπ webCraig Newmark PhilanthropiesSource βNotes, has provided crucial early-stage funding but may be insufficient for the scale required.
Current State and Trajectory
Section titled βCurrent State and TrajectoryβNear-term Developments (1-2 years)
Section titled βNear-term Developments (1-2 years)βThe immediate trajectory will be shaped by several ongoing developments:
- Commercial verification systems from major tech companies will provide real-world performance data
- Regulatory frameworks in the EU and potentially other jurisdictions will test enforcement mechanisms
- International coordination through AI Safety Institutes and summits will reveal cooperation possibilities
- Lab RSP implementation will demonstrate voluntary coordination track record
Medium-term Projections (2-5 years)
Section titled βMedium-term Projections (2-5 years)β| Domain | Most Likely Outcome | Probability | Strategic Implications |
|---|---|---|---|
| Technical verification | Modest success, arms race dynamics | 60% | Continued R&D investment, no single solution |
| Lab coordination | External oversight required | 65% | Regulatory frameworks necessary |
| International governance | Narrow cooperation only | 55% | Focus on specific risks, not comprehensive regime |
| Epistemic infrastructure | Chronically underfunded | 70% | Accept limited scale, prioritize high-leverage applications |
The resolution of these solution cruxes will fundamentally shape AI safety strategy over the next decade. If technical verification approaches prove viable, we may see an arms race between generation and detection systems. If coordination mechanisms succeed, we could see the emergence of global AI governance institutions. If they fail, we may face an uncoordinated race with significant safety risks.
Key Research Priorities
Section titled βKey Research PrioritiesβThe highest-priority uncertainties requiring systematic research include:
Technical Verification Research
Section titled βTechnical Verification Researchβ- Systematic adversarial testing of verification systems across attack scenarios
- Economic analysis comparing costs of verification vs generation at scale
- Theoretical bounds on detection performance under optimal adversarial conditions
- User behavior studies on provenance checking and verification adoption
Coordination Mechanism Analysis
Section titled βCoordination Mechanism Analysisβ- Game-theoretic modeling of commitment mechanisms under competitive pressure
- Historical analysis of coordination successes and failures in high-stakes domains
- Empirical tracking of RSP implementation and compliance across labs
- Regulatory effectiveness studies comparing different governance approaches
Epistemic Infrastructure Design
Section titled βEpistemic Infrastructure Designβ- Hybrid system architecture for combining AI and human judgment optimally
- Funding model innovation for sustainable epistemic public goods
- Platform integration studies for verification system adoption
- Cross-platform coordination mechanisms for epistemic infrastructure
Key Uncertainties and Strategic Dependencies
Section titled βKey Uncertainties and Strategic DependenciesβThese cruxes are interconnected in complex ways that create strategic dependencies:
- Technical feasibility affects coordination incentives: If verification systems work well, labs may be more willing to adopt them voluntarily
- Coordination success affects infrastructure funding: Successful international cooperation could unlock government investment in epistemic public goods
- Infrastructure sustainability affects technical development: Reliable funding enables long-term R&D programs for verification systems
- International dynamics affect all domains: US-China competition shapes both technical development and coordination possibilities
Understanding these dependencies will be crucial for developing comprehensive solution strategies that account for the interconnected nature of technical, coordination, and infrastructure challenges.
Sources & Resources
Section titled βSources & ResourcesβTechnical Research Organizations
Section titled βTechnical Research Organizationsβ| Organization | Focus Area | Key Publications |
|---|---|---|
| DARPAβπ webDARPASource βNotes | Semantic forensics, verification | SemaFor programβπ webDARPA SemaForSemaFor focuses on creating advanced detection technologies that go beyond statistical methods to identify semantic inconsistencies in deepfakes and AI-generated media. The prog...Source βNotes |
| C2PAβπ webC2PA Explainer VideosThe Coalition for Content Provenance and Authenticity (C2PA) offers a technical standard that acts like a 'nutrition label' for digital content, tracking its origin and edit his...Source βNotes | Content provenance standards | Technical specificationβπ webTechnical specificationSource βNotes |
| Google DeepMindβπ webβ β β β βGoogle DeepMindGoogle DeepMindSource βNotes | Watermarking, detection | SynthID researchβπ webβ β β β βGoogle DeepMindGoogle SynthIDSynthID embeds imperceptible watermarks in AI-generated content to help identify synthetic media without degrading quality. It works across images, audio, and text platforms.Source βNotes |
Governance and Coordination Research
Section titled βGovernance and Coordination Researchβ| Organization | Focus Area | Key Resources |
|---|---|---|
| GovAIβποΈ governmentβ β β β βCentre for the Governance of AIGovAIA research organization focused on understanding AI's societal impacts, governance challenges, and policy implications across various domains like workforce, infrastructure, and...Source βNotes | AI governance, coordination | Compute governance researchβποΈ governmentβ β β β βCentre for the Governance of AICompute governance researchSource βNotes |
| RAND Corporationβπ webβ β β β βRAND CorporationRANDRAND conducts policy research analyzing AI's societal impacts, including potential psychological and national security risks. Their work focuses on understanding AI's complex im...Source βNotes | Strategic analysis | AI competition studiesβπ webβ β β β βRAND CorporationRAND: AI and National SecuritySource βNotes |
| CNASβπ webβ β β β βCNASCNASSource βNotes | Security, international relations | AI security reportsβπ webβ β β β βCNASAI security reportsSource βNotes |
Epistemic Infrastructure Organizations
Section titled βEpistemic Infrastructure Organizationsβ| Organization | Focus Area | Key Resources |
|---|---|---|
| Metaculusβπ webβ β β ββMetaculusMetaculusMetaculus is an online forecasting platform that allows users to predict future events and trends across areas like AI, biosecurity, and climate change. It provides probabilisti...Source βNotes | Forecasting, prediction | AI forecasting projectβπ webβ β β ββMetaculusMetaculus AI ForecastingSource βNotes |
| Good Judgmentβπ webTetlock researchPhilip Tetlock's research on Superforecasting reveals a group of experts who consistently outperform traditional forecasting methods by applying rigorous analytical techniques a...Source βNotes | Superforecasting | Crowd forecasting methodology |
Safety Research and Evaluation
Section titled βSafety Research and Evaluationβ| Organization | Focus Area | Key Resources |
|---|---|---|
| METRβπ webβ β β β βMETRmetr.orgSource βNotes | Third-party AI evaluations | Autonomous capability assessments |
| Anthropic Alignmentβπ webβ β β β βAnthropic AlignmentAnthropic Alignment Science BlogSource βNotes | Technical alignment research | Research directions 2025βπ webβ β β β βAnthropic AlignmentAnthropic: Recommended Directions for AI Safety ResearchAnthropic proposes a range of technical research directions for mitigating risks from advanced AI systems. The recommendations cover capabilities evaluation, model cognition, AI...Source βNotes |
| UK AI Safety InstituteβποΈ governmentβ β β β βUK AI Safety InstituteAI Safety InstituteSource βNotes | Government evaluations | Evaluation approachβποΈ governmentβ β β β βUK GovernmentUK AI Safety InstituteSource βNotes |
Key 2024-2025 Reports
Section titled βKey 2024-2025 Reportsβ| Report | Organization | Focus |
|---|---|---|
| 2025 AI Safety Indexβπ webβ β β ββFuture of Life InstituteAI Safety Index Winter 2025The Future of Life Institute assessed eight AI companies on 35 safety indicators, revealing substantial gaps in risk management and existential safety practices. Top performers ...Source βNotes | Future of Life Institute | Industry safety practices |
| International AI Safety Report 2025βπ webInternational AI Safety Report 2025The International AI Safety Report 2025 provides a global scientific assessment of general-purpose AI capabilities, risks, and potential management techniques. It represents a c...Source βNotes | 96 AI experts, 30 countries | Global safety assessment |
| [36fb43e4e059f0c9] | Alignment Forum | Research progress review |
| Mechanistic Interpretability Reviewβπ paperβ β β ββarXivSparse AutoencodersLeonard Bereska, Efstratios Gavves (2024)Source βNotes | TMLR | Interpretability research survey |
| [482b71342542a659] | GovAI | Compute governance mechanisms |
| Global AI Governance Analysisβπ webOxford International AffairsSource βNotes | International Affairs | Governance deficit assessment |