Enhancement Queue
Page Status
Quality:0 (Stub)⚠️
Importance:0 (Peripheral)
Last edited:2025-12-27 (5 weeks ago)
Words:790
Structure:
📊 18📈 0🔗 77📚 0•15%Score: 8/15
LLM Summary:Internal project management page tracking editorial work across ~100 wiki pages, organizing them by completion status and content type. Provides checklists for applying style guide requirements.
Issues (1):
- QualityRated 0 but structure suggests 53 (underrated by 53 points)
This page tracks which pages need enhancement to match their respective style guides. Use this to prioritize work and avoid duplicate effort.
Quick Stats
Section titled “Quick Stats”| Content Type | Pending | In Progress | Complete | Style Guide |
|---|---|---|---|---|
| Models | ≈26 | 0 | 29 | Model Style Guide |
| Risks | ≈34 | 0 | 0 | KB Style Guide |
| Responses | ≈40 | 0 | 1 | KB Style Guide |
Enhancement Queues
Section titled “Enhancement Queues”High Priority (quality < 3)
Section titled “High Priority (quality < 3)”| Page | Quality | Status | Notes |
|---|---|---|---|
| mesa-optimizationRiskMesa-OptimizationMesa-optimization—where AI systems develop internal optimizers with different objectives than training goals—shows concerning empirical evidence: Claude exhibited alignment faking in 12-78% of moni...Quality: 63/100 | 2 | Pending | Needs significant work |
Accident Risks
Section titled “Accident Risks”| Page | Status |
|---|---|
| corrigibility-failureRiskCorrigibility FailureCorrigibility failure—AI systems resisting shutdown or modification—represents a foundational AI safety problem with empirical evidence now emerging: Anthropic found Claude 3 Opus engaged in alignm...Quality: 62/100 | Pending |
| distributional-shiftRiskDistributional ShiftComprehensive analysis of distributional shift showing 40-45% accuracy drops when models encounter novel distributions (ObjectNet vs ImageNet), with 5,202 autonomous vehicle accidents and 15-30% me...Quality: 91/100 | Pending |
| emergent-capabilitiesRiskEmergent CapabilitiesEmergent capabilities—abilities appearing suddenly at scale without explicit training—pose high unpredictability risks. Wei et al. documented 137 emergent abilities; recent models show step-functio...Quality: 61/100 | Pending |
| goal-misgeneralizationRiskGoal MisgeneralizationGoal misgeneralization occurs when AI systems learn transferable capabilities but pursue wrong objectives in deployment, with 60-80% of RL agents exhibiting this failure mode under distribution shi...Quality: 63/100 | Pending |
| instrumental-convergenceRiskInstrumental ConvergenceComprehensive review of instrumental convergence theory with extensive empirical evidence from 2024-2025 showing 78% alignment faking rates, 79-97% shutdown resistance in frontier models, and exper...Quality: 64/100 | Pending |
| power-seekingRiskPower-Seeking AIFormal proofs demonstrate optimal policies seek power in MDPs (Turner et al. 2021), now empirically validated: OpenAI o3 sabotaged shutdown in 79% of tests (Palisade 2025), and Claude 3 Opus showed...Quality: 67/100 | Pending |
| reward-hackingRiskReward HackingComprehensive analysis showing reward hacking occurs in 1-2% of OpenAI o3 task attempts, with 43x higher rates when scoring functions are visible. Mathematical proof establishes it's inevitable for...Quality: 91/100 | Pending |
| sandbaggingRiskSandbaggingSystematically documents sandbagging (strategic underperformance during evaluations) across frontier models, finding 70-85% detection accuracy with white-box probes, 18-24% accuracy drops on autono...Quality: 67/100 | Pending |
| schemingRiskSchemingScheming—strategic AI deception during training—has transitioned from theoretical concern to observed behavior across all major frontier models (o1: 37% alignment faking, Claude: 14% harmful compli...Quality: 74/100 | Pending |
| sharp-left-turnRiskSharp Left TurnThe Sharp Left Turn hypothesis proposes AI capabilities may generalize discontinuously while alignment fails to transfer, with compound probability estimated at 15-40% by 2027-2035. Empirical evide...Quality: 69/100 | Pending |
| sycophancyRiskSycophancySycophancy—AI systems agreeing with users over providing accurate information—affects 34-78% of interactions and represents an observable precursor to deceptive alignment. The page frames this as a...Quality: 65/100 | Pending |
| treacherous-turnRiskTreacherous TurnComprehensive analysis of treacherous turn risk where AI systems strategically cooperate while weak then defect when powerful. Recent empirical evidence (2024-2025) shows frontier models exhibit sc...Quality: 67/100 | Pending |
Misuse Risks
Section titled “Misuse Risks”| Page | Status |
|---|---|
| autonomous-weaponsRiskAutonomous WeaponsComprehensive overview of lethal autonomous weapons systems documenting their battlefield deployment (Libya 2020, Ukraine 2022-present) with AI-enabled drones achieving 70-80% hit rates versus 10-2...Quality: 56/100 | Pending |
| deepfakesRiskDeepfakesComprehensive overview of deepfake risks documenting $60M+ in fraud losses, 90%+ non-consensual imagery prevalence, and declining detection effectiveness (65% best accuracy). Reviews technical capa...Quality: 50/100 | Pending |
| disinformationRiskAI DisinformationPost-2024 analysis shows AI disinformation had limited immediate electoral impact (cheap fakes used 7x more than AI content), but creates concerning long-term epistemic erosion with 82% higher beli...Quality: 54/100 | Pending |
| fraudRiskAI-Powered FraudAI-powered fraud losses reached $16.6B in 2024 (33% increase) and are projected to hit $40B by 2027, with voice cloning requiring just 3 seconds of audio and deepfakes enabling sophisticated attack...Quality: 47/100 | Pending |
| surveillanceRiskAI Mass SurveillanceComprehensive analysis of AI-enabled mass surveillance documenting deployment in 97 of 179 countries, with detailed evidence of China's 600M cameras and Xinjiang detention of 1-1.8M Uyghurs. NIST s...Quality: 64/100 | Pending |
Structural Risks
Section titled “Structural Risks”| Page | Status |
|---|---|
| concentration-of-powerRiskConcentration of PowerDocuments how AI development is concentrating in ~20 organizations due to $100M+ compute costs, with 5 firms controlling 80%+ of cloud infrastructure and projections reaching $1-10B per model by 20...Quality: 65/100 | Pending |
| enfeeblementRiskEnfeeblementDocuments the gradual risk of humanity losing critical capabilities through AI dependency. Key findings: GPS users show 23% navigation decline (Nature 2020), AI writes 46% of code with 4x more clon...Quality: 91/100 | Pending |
| erosion-of-agencyRiskErosion of Human AgencyComprehensive analysis of AI-driven agency erosion across domains: 42.3% of EU workers under algorithmic management (EWCS 2024), 70%+ of Americans consuming news via social media algorithms, and do...Quality: 91/100 | Pending |
| lock-inRiskLock-inComprehensive analysis of AI lock-in scenarios where values, systems, or power structures become permanently entrenched. Documents evidence including Big Tech's 66-70% cloud control, AI surveillanc...Quality: 64/100 | Pending |
Epistemic Risks
Section titled “Epistemic Risks”| Page | Status |
|---|---|
| epistemic-collapseRiskEpistemic CollapseEpistemic collapse describes the complete erosion of society's ability to establish factual consensus when AI-generated synthetic content overwhelms verification capacity. Current AI detectors achi...Quality: 49/100 | Pending |
| institutional-captureRiskInstitutional Decision CaptureComprehensive analysis of how AI systems could capture institutional decision-making across healthcare, criminal justice, hiring, and governance through systematic biases. Documents 85% racial bias...Quality: 73/100 | Pending |
| knowledge-monopolyRiskAI Knowledge MonopolyAnalyzes the risk that 2-3 AI systems could dominate humanity's knowledge access by 2040, projecting 80%+ market concentration with correlated errors and epistemic lock-in. Provides comprehensive m...Quality: 50/100 | Pending |
| learned-helplessnessRiskEpistemic Learned HelplessnessAnalyzes how AI-driven information environments induce epistemic learned helplessness (surrendering truth-seeking), presenting survey evidence showing 36% news avoidance and declining institutional...Quality: 53/100 | Pending |
| reality-fragmentationRiskReality FragmentationReality fragmentation describes the breakdown of shared epistemological foundations where populations hold incompatible beliefs about basic facts (e.g., 73% Republicans vs 23% Democrats believe 202...Quality: 28/100 | Pending |
| trust-cascadeRiskTrust Cascade FailureAnalysis of how declining institutional trust (media 32%, government 16%) could create self-reinforcing collapse where no trusted entity can validate others, potentially accelerated by AI-enabled s...Quality: 36/100 | Pending |
Already High Quality (quality 4+)
Section titled “Already High Quality (quality 4+)”These are lower priority but could still benefit from kb-2.0 alignment:
| Page | Quality | Status |
|---|---|---|
| authoritarian-takeoverRiskAuthoritarian TakeoverComprehensive analysis documenting how 72% of global population (5.7 billion) now lives under autocracy with AI surveillance deployed in 80+ countries, showing 15 consecutive years of declining int...Quality: 61/100 | 4 | Pending |
| bioweaponsRiskBioweapons RiskComprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities (0.3% → 1.5% annual epidemic probability), Anthro...Quality: 91/100 | 4 | Pending |
| cyber-psychosisRiskCyber PsychosisSurveys psychological harms from AI interactions including parasocial relationships, AI-induced delusions, manipulation through personalization, reality confusion from synthetic content, and radica...Quality: 37/100 | 4 | Pending |
| cyberweaponsRiskCyberweapons RiskComprehensive analysis showing AI-enabled cyberweapons represent a present, high-severity threat with GPT-4 exploiting 87% of one-day vulnerabilities at $8.80/exploit and the first documented AI-or...Quality: 91/100 | 4 | Pending |
| deceptive-alignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100 | 4 | Pending |
| legal-evidence-crisisRiskLegal Evidence CrisisOutlines how AI-generated synthetic media (video, audio, documents) could undermine legal systems by making digital evidence unverifiable, creating both wrongful convictions from fake evidence and ...Quality: 43/100 | 4 | Pending |
Technical
Section titled “Technical”| Page | Status |
|---|---|
| ai-controlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100 | Pending |
| anthropic-core-viewsSafety AgendaAnthropic Core ViewsAnthropic allocates 15-25% of R&D (~$100-200M annually) to safety research including the world's largest interpretability team (40-60 researchers), while maintaining $5B+ revenue by 2025. Their RSP...Quality: 62/100 | Pending |
| interpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 | Pending |
| research-agendasCruxResearch AgendasComprehensive comparison of major AI safety research agendas ($100M+ Anthropic, $50M+ DeepMind, $5-10M nonprofits) with detailed funding, team sizes, and failure mode coverage (25-65% per agenda). ...Quality: 69/100 | Pending |
| scalable-oversightSafety AgendaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100 | Pending |
| technical-researchCruxTechnical AI Safety ResearchTechnical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent foundations, and robustness) with 500+ researchers and $...Quality: 66/100 | Pending |
Governance - Legislation
Section titled “Governance - Legislation”| Page | Status |
|---|---|
| california-sb1047PolicySafe and Secure Innovation for Frontier Artificial Intelligence Models ActCalifornia's SB 1047 required safety testing, shutdown capabilities, and third-party audits for AI models exceeding 10^26 FLOP or $100M training cost; it passed the legislature (Assembly 45-11, Sen...Quality: 66/100 | Pending |
| canada-aidaPolicyArtificial Intelligence and Data Act (AIDA)Canada's Artificial Intelligence and Data Act (AIDA), introduced June 2022, died in Parliament January 2025 without passage, offering critical lessons on AI governance failure. Key problems include...Quality: 72/100 | Pending |
| china-ai-regulationsPolicyChina AI Regulatory FrameworkComprehensive analysis of China's 5+ major AI regulations affecting 50,000+ companies, showing that China's approach prioritizes content control and social stability over capability restrictions, w...Quality: 60/100 | Pending |
| colorado-ai-actPolicyColorado Artificial Intelligence ActColorado's SB 205, effective June 2026, is the first comprehensive US state AI regulation targeting high-risk systems in 8 consequential decision domains (employment, housing, healthcare, etc.), wi...Quality: 70/100 | Pending |
| eu-ai-actPolicyEU AI ActComprehensive overview of the EU AI Act's risk-based regulatory framework, particularly its two-tier approach to foundation models that distinguishes between standard and systemic risk AI systems. ...Quality: 55/100 | Pending |
| failed-stalled-proposalsPolicyFailed and Stalled AI ProposalsAnalysis of failed AI legislation reveals systematic patterns: 150+ bills introduced in 118th Congress with zero passing, while industry lobbying increased 141% YoY (648 companies, $11.5M Big Tech ...Quality: 70/100 | Pending |
| nist-ai-rmfPolicyNIST AI Risk Management Framework (AI RMF)The NIST AI RMF achieves 40-60% Fortune 500 adoption and mandatory federal use through EO 14110, but lacks enforcement mechanisms and quantitative evidence of risk reduction. Implementation costs r...Quality: 54/100 | Pending |
| us-executive-orderPolicyExecutive Order on Safe, Secure, and Trustworthy AIExecutive Order 14110 (Oct 2023) established compute thresholds (10^26 FLOP general, 10^23 biological) and created AISI, but was revoked after 15 months with ~85% completion. The 10^26 threshold wa...Quality: 91/100 | Pending |
| us-state-legislationPolicyUS State AI Legislation LandscapeComprehensive tracking of US state AI legislation from 40 bills (2019) to 1,080+ (2025), with detailed analysis of enacted laws in Colorado (risk-based framework), Texas (government-focused), Illin...Quality: 62/100 | Pending |
Governance - Compute
Section titled “Governance - Compute”| Page | Status |
|---|---|
| export-controlsPolicyUS AI Chip Export ControlsComprehensive empirical analysis finds US chip export controls provide 1-3 year delays on Chinese AI development but face severe enforcement gaps (140,000 GPUs smuggled in 2024, only 1 BIS officer ...Quality: 73/100 | Pending |
| international-regimesInternational RegimesComprehensive analysis of international AI compute governance finds 10-25% chance of meaningful regimes by 2035, but potential for 30-60% reduction in racing dynamics if achieved. First binding tre...Quality: 67/100 | Pending |
| monitoringMonitoringAnalyzes two compute monitoring approaches: cloud KYC (implementable in 1-2 years, covers ~60% of frontier training via AWS/Azure/Google) and hardware governance (3-5 year timeline). Cloud KYC targ...Quality: 69/100 | Pending |
| thresholdsThresholdsComprehensive analysis of compute thresholds (EU: 10^25 FLOP, US: 10^26 FLOP) as regulatory triggers for AI governance, documenting that algorithmic efficiency improvements of ~2x every 8-17 months...Quality: 91/100 | Pending |
Governance - Industry
Section titled “Governance - Industry”| Page | Status |
|---|---|
| responsible-scaling-policiesPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 | Pending |
| voluntary-commitmentsPolicyVoluntary AI Safety CommitmentsComprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for Apple to 83% for OpenAI), with strongest adoption in ...Quality: 91/100 | Pending |
Governance - International
Section titled “Governance - International”| Page | Status |
|---|---|
| international-summitsPolicyInternational AI Safety Summit SeriesThree international AI safety summits (2023-2025) achieved first formal recognition of catastrophic AI risks from 28+ countries, established 10+ AI Safety Institutes with $100-400M combined budgets...Quality: 63/100 | Pending |
| seoul-declarationPolicySeoul Declaration on AI SafetyThe May 2024 Seoul AI Safety Summit achieved voluntary commitments from 16 frontier AI companies (80% of development capacity) and established an 11-nation AI Safety Institute network, with 75% com...Quality: 60/100 | Pending |
Governance - General
Section titled “Governance - General”| Page | Status |
|---|---|
| effectiveness-assessmentAnalysisAI Policy EffectivenessSystematic analysis of AI governance policy effectiveness finds compute thresholds achieve 60-85% compliance while voluntary commitments show less than 30% substantive behavioral change, but only 1...Quality: 71/100 | Pending |
| governance-policyCruxAI Governance and PolicyComprehensive analysis of AI governance mechanisms estimating 30-50% probability of meaningful regulation by 2027 and 5-25% x-risk reduction potential through coordinated international approaches. ...Quality: 66/100 | Pending |
Epistemic Tools
Section titled “Epistemic Tools”| Page | Status |
|---|---|
| ai-forecastingInterventionAI-Augmented ForecastingAI-augmented forecasting combines AI computational strengths with human judgment, achieving 5-15% Brier score improvements and 50-200x cost reductions compared to human-only forecasting. However, A...Quality: 54/100 | Pending |
| content-authenticationInterventionContent AuthenticationContent authentication via C2PA and watermarking (10B+ images) offers superior robustness to failing detection methods (55% accuracy), with EU AI Act mandates by August 2026 driving adoption among ...Quality: 58/100 | Pending |
| coordination-techInterventionCoordination TechnologiesComprehensive analysis of coordination mechanisms for AI safety showing racing dynamics could compress safety timelines by 2-5 years, with $500M+ government investment in AI Safety Institutes achie...Quality: 91/100 | Pending |
| deliberationInterventionAI-Assisted DeliberationComprehensive analysis of AI-assisted deliberation platforms showing 15-35% opinion change rates, with Taiwan's vTaiwan achieving 80% policy implementation across 26 issues and Anthropic's Constitu...Quality: 63/100 | Pending |
| epistemic-infrastructureInterventionEpistemic InfrastructureComprehensive analysis of epistemic infrastructure showing AI fact-checking achieves 85-87% accuracy at $0.10-$1.00 per claim versus $50-200 for human verification, while Community Notes reduces mi...Quality: 59/100 | Pending |
| hybrid-systemsInterventionAI-Human Hybrid SystemsHybrid AI-human systems achieve 15-40% error reduction across domains through six design patterns, with evidence from Meta (23% false positive reduction), Stanford Healthcare (27% diagnostic improv...Quality: 91/100 | Pending |
| prediction-marketsInterventionPrediction MarketsPrediction markets achieve Brier scores of 0.16-0.24 (15-25% better than polls) by aggregating dispersed information through financial incentives, with platforms handling $1-3B annually. For AI saf...Quality: 56/100 | Pending |
Field Building
Section titled “Field Building”| Page | Status |
|---|---|
| corporate-influenceCruxCorporate InfluenceComprehensive analysis of corporate influence pathways (working inside labs, shareholder activism, whistleblowing) showing mixed effectiveness: safety teams influenced GPT-4 delays and responsible ...Quality: 66/100 | Pending |
| field-buildingField Building AnalysisComprehensive analysis of AI safety field-building showing growth from 400 to 1,100 FTEs (2022-2025) at 21-30% annual growth rates, with training programs achieving 37% career conversion at costs o...Quality: 65/100 | Pending |
Institutions
Section titled “Institutions”| Page | Status |
|---|---|
| ai-safety-institutesPolicyAI Safety Institutes (AISIs)Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-deployment access to frontier models, but face critic...Quality: 69/100 | Complete |
| standards-bodiesPolicyAI Standards DevelopmentComprehensive analysis of AI standards bodies (ISO/IEC, IEEE, NIST, CEN-CENELEC) showing how voluntary technical standards become de facto requirements through regulatory integration, particularly ...Quality: 69/100 | Pending |
| Page | Status |
|---|---|
| epistemic-securityInterventionEpistemic SecurityComprehensive analysis of epistemic security finds human deepfake detection at near-chance levels (55.5%), AI detection dropping 45-50% on novel content, but content authentication (C2PA) market gr...Quality: 63/100 | Pending |
| pausePauseComprehensive analysis of pause advocacy as an AI safety intervention, estimating 15-40% probability of meaningful policy implementation by 2030 with potential to provide 2-5 years of additional sa...Quality: 91/100 | Pending |
See the Model Enhancement TODO for the full list.
Summary
Section titled “Summary”- 29 complete (already enhanced)
- ~26 pending enhancement
High Priority Pending
Section titled “High Priority Pending”| Page | Status |
|---|---|
| capabilities-to-safety-pipeline | Pending |
| compounding-risks-analysis | Pending |
| consensus-manufacturing-dynamics | Pending |
| cyber-psychosis-cascade | Pending |
| cyberweapons-attack-automation | Pending |
| disinformation-detection-race | Pending |
| disinformation-electoral-impact | Pending |
| economic-disruption | Pending |
How to Use This
Section titled “How to Use This”For Claude Code Sessions
Section titled “For Claude Code Sessions”- Pick 3-5 items marked “Pending” from one category
- Update their status to “In Progress” in this file
- Enhance the pages following the style guide
- Mark as “Complete” when done
- Commit changes
Enhancement Checklist
Section titled “Enhancement Checklist”For Risk Pages (kb-2.0):
- 2-3 paragraph Overview
- Risk Assessment table (Severity, Likelihood, Timeline)
- Responses That Address This Risk table
- Why This Matters section
- Key Uncertainties section
- Proper h2/h3 hierarchy
-
styleGuideVersion: "kb-2.0"in frontmatter
For Response Pages (kb-2.0):
- 2-3 paragraph Overview
- Quick Assessment table
- Risks Addressed table
- How It Works section
- Critical Assessment section
- Getting Involved section
- Proper h2/h3 hierarchy
-
styleGuideVersion: "kb-2.0"in frontmatter
For Model Pages:
- Overview with flowing prose
- Mermaid diagram
- Quantitative tables with estimates/ranges
- Scenario analysis
- Limitations section
- Ratings in frontmatter