Longterm Wiki

Sandboxing / Containment

Sandboxing limits AI system access to resources, networks, and capabilities as a defense-in-depth measure. METR's August 2025 evaluation found GPT-5's time horizon at approximately 2 hours, insufficient for autonomous replication. AI boxing experiments show 60-70% social engineering escape rates.

Related

Related Pages

Top Related Pages

Risks

Rogue AI Scenarios

Analysis

Goal Misgeneralization Probability Model

Approaches

Multi-Agent SafetyCircuit Breakers / Inference InterventionsDangerous Capability EvaluationsSleeper Agent Detection

Organizations

OpenAI

Concepts

Alignment Deployment OverviewSelf-Improvement and Recursive Enhancement

Tags

containmentdefense-in-depthagent-safetycontainer-securitydeployment-safety