88ImportanceHighImportance: 88/100How central this topic is to AI safety. Higher scores mean greater relevance to understanding or mitigating AI risk.38ResearchLowResearch Value: 38/100How much value deeper investigation of this topic could yield. Higher scores indicate under-explored topics with high insight potential.
Content1/13
LLM summaryLLM summaryBasic text summary used in search results, entity link tooltips, info boxes, and related page cards.crux content improve <id>ScheduleScheduleHow often the page should be refreshed. Drives the overdue tracking system.Set updateFrequency in frontmatterEntityEntityYAML entity definition with type, description, and related entries.Edit historyEdit historyTracked changes from improve pipeline runs and manual edits.crux edit-log view <id>OverviewOverviewA ## Overview heading section that orients readers. Helps with search and AI summaries.Add a ## Overview section at the top of the page
Tables0/ ~1TablesData tables for structured comparisons and reference material.Add data tables to the pageDiagrams0DiagramsVisual content — Mermaid diagrams, charts, or Squiggle estimate models.Add Mermaid diagrams or Squiggle modelsInt. links0/ ~3Int. linksLinks to other wiki pages. More internal links = better graph connectivity.Add links to other wiki pagesExt. links0/ ~1Ext. linksLinks to external websites, papers, and resources outside the wiki.Add links to external sourcesFootnotes0/ ~2FootnotesFootnote citations [^N] with source references at the bottom of the page.Add [^N] footnote citationsReferences0/ ~1ReferencesCurated external resources linked via <R> components or cited_by in YAML.Add <R> resource linksQuotes0QuotesSupporting quotes extracted from cited sources to back up page claims.crux citations extract-quotes <id>Accuracy0AccuracyCitations verified against their sources for factual accuracy.crux citations verify <id>Backlinks2BacklinksNumber of other wiki pages that link to this page. Higher backlink count means better integration into the knowledge graph.
Issues1
StructureNo tables or diagrams - consider adding visual content
Benchmarking
Concept
AI Benchmarking
Standardized evaluations for measuring AI capabilities and safety properties
AI Capability SandbaggingRiskAI Capability SandbaggingSystematically documents sandbagging (strategic underperformance during evaluations) across frontier models, finding 70-85% detection accuracy with white-box probes, 18-24% accuracy drops on autono...Quality: 67/100
Analysis
Safety-Capability Tradeoff ModelAnalysisSafety-Capability Tradeoff ModelAnalyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usability +10-30%. Under strong racing dynamics (60-7...Quality: 64/100
Organizations
US AI Safety InstituteOrganizationUS AI Safety InstituteThe US AI Safety Institute (AISI), established November 2023 within NIST with \$10M budget (FY2025 request \$82.7M), conducted pre-deployment evaluations of frontier models through MOUs with OpenAI...Quality: 91/100METROrganizationMETRMETR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replication, cybersecurity, CBRN, and manipulation capabi...Quality: 66/100Apollo ResearchOrganizationApollo ResearchApollo Research demonstrated in December 2024 that all six tested frontier models (including o1, Claude 3.5 Sonnet, Gemini 1.5 Pro) engage in scheming behaviors, with o1 maintaining deception in ov...Quality: 58/100Alignment Research CenterOrganizationAlignment Research CenterComprehensive reference page on ARC (Alignment Research Center), covering its evolution from a dual theory/evals organization to ARC Theory (3 permanent researchers) plus the METR spin-out (Decembe...Quality: 57/100UK AI Safety InstituteOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100Japan AI Safety InstituteOrganizationJapan AI Safety InstituteJapan's national AI safety institute, established in February 2024 under the Ministry of Economy, Trade and Industry (METI). Focuses on AI standards research and evaluation frameworks. Part of the ...
Concepts
Situational AwarenessCapabilitySituational AwarenessComprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 frontier models demonstrate scheming capabilities, a...Quality: 67/100Capability EvaluationsConceptCapability EvaluationsSystematic assessment of AI systems' abilities, especially dangerous capabilities like deception, manipulation, or autonomous operation.AI Training Data ConstraintsConceptAI Training Data ConstraintsLimitations on AI training due to availability, quality, or accessibility of training data.
Key Debates
Technical AI Safety ResearchCruxTechnical AI Safety ResearchTechnical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent foundations, and robustness) with 500+ researchers and \...Quality: 66/100Is Scaling All You Need?CruxIs Scaling All You Need?Comprehensive survey of the 2024-2025 scaling debate, documenting the shift from pure pretraining to 'scaling-plus' approaches after o3 achieved 87.5% on ARC-AGI-1 but GPT-5 faced 2-year delays. Ex...Quality: 42/100
Policy
Evals-Based Deployment GatesPolicyEvals-Based Deployment GatesEvals-based deployment gates create formal checkpoints requiring AI systems to pass safety evaluations before deployment, with EU AI Act imposing fines up to EUR 35M/7% turnover and UK AISI testing...Quality: 66/100
Other
Beth BarnesPersonBeth BarnesAI safety researcher, founder of METR (formerly ARC Evals). Focus on evaluating dangerous AI capabilities.