21QualityDraftQuality: 21/100LLM-assigned rating of overall page quality, considering depth, accuracy, and completeness.42ImportanceReferenceImportance: 42/100How central this topic is to AI safety. Higher scores mean greater relevance to understanding or mitigating AI risk.51.5ResearchModerateResearch Value: 51.5/100How much value deeper investigation of this topic could yield. Higher scores indicate under-explored topics with high insight potential.
Summary
This is a shallow navigation/index page listing six deployment safety concepts (sandboxing, AI control, structured access, tool restrictions, output filtering, multi-agent safety) with no substantive content, analysis, or explanation beyond category labels.
Content3/13
LLM summaryLLM summaryBasic text summary used in search results, entity link tooltips, info boxes, and related page cards.ScheduleScheduleHow often the page should be refreshed. Drives the overdue tracking system.Set updateFrequency in frontmatterEntityEntityYAML entity definition with type, description, and related entries.Add entity YAML in data/entities/Edit history1Edit historyTracked changes from improve pipeline runs and manual edits.OverviewOverviewA ## Overview heading section that orients readers. Helps with search and AI summaries.Add a ## Overview section at the top of the page
Tables0/ ~1TablesData tables for structured comparisons and reference material.Add data tables to the pageDiagrams0DiagramsVisual content — Mermaid diagrams, charts, or Squiggle estimate models.Add Mermaid diagrams or Squiggle modelsInt. links6/ ~3Int. linksLinks to other wiki pages. More internal links = better graph connectivity.Ext. links0/ ~1Ext. linksLinks to external websites, papers, and resources outside the wiki.Add links to external sourcesFootnotes0/ ~2FootnotesFootnote citations [^N] with source references at the bottom of the page.Add [^N] footnote citationsReferences0/ ~1ReferencesCurated external resources linked via <R> components or cited_by in YAML.Add <R> resource linksQuotes0QuotesSupporting quotes extracted from cited sources to back up page claims.crux citations extract-quotes <id>Accuracy0AccuracyCitations verified against their sources for factual accuracy.crux citations verify <id>RatingsN:1 R:1 A:2 C:2RatingsSub-quality ratings: Novelty, Rigor, Actionability, Completeness (0-10 scale).
Change History1
Clarify overview pages with new entity type3 weeks ago
Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.
Issues1
StructureNo tables or diagrams - consider adding visual content
Deployment & Control (Overview)
Deployment methods focus on maintaining safety during AI system operation.
Containment:
SandboxingApproachSandboxing / ContainmentComprehensive analysis of AI sandboxing as defense-in-depth, synthesizing METR's 2025 evaluations (GPT-5 time horizon ~2h, capabilities doubling every 7 months), AI boxing experiments (60-70% escap...Quality: 91/100: Isolating AI systems from the outside world
AI ControlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100: Maintaining human oversight and control
Access Management:
Structured AccessApproachStructured Access / API-OnlyStructured access (API-only deployment) provides meaningful safety benefits through monitoring (80-95% detection rates), intervention capability, and controlled proliferation. Enterprise LLM spend ...Quality: 91/100: Tiered access to model capabilities
Tool RestrictionsApproachTool-Use RestrictionsTool-use restrictions provide hard limits on AI agent capabilities through defense-in-depth approaches combining permissions, sandboxing, and human-in-the-loop controls. Empirical evidence shows ME...Quality: 91/100: Limiting available actions and tools
Output Safety:
Output FilteringApproachAI Output FilteringComprehensive analysis of AI output filtering showing detection rates of 70-98% depending on content type, with 100% of models vulnerable to jailbreaks per UK AISI testing, though Anthropic's Const...Quality: 63/100: Screening model outputs for harm
Multi-System:
Multi-Agent SafetyApproachMulti-Agent SafetyMulti-agent safety addresses coordination failures, conflict, and collusion risks when AI systems interact. A 2025 report from 50+ researchers identifies seven key risk factors; empirical studies s...Quality: 68/100: Safety in systems with multiple AI agents
AI Output FilteringApproachAI Output FilteringComprehensive analysis of AI output filtering showing detection rates of 70-98% depending on content type, with 100% of models vulnerable to jailbreaks per UK AISI testing, though Anthropic's Const...Quality: 63/100