Sandboxing / Containment
Sandboxing limits AI system access to resources, networks, and capabilities as a defense-in-depth measure. METR's August 2025 evaluation found GPT-5's time horizon at approximately 2 hours, insufficient for autonomous replication. AI boxing experiments show 60-70% social engineering escape rates.
Related
Related Pages
Top Related Pages
Tool-Use Restrictions
Tool-use restrictions limit what actions and APIs AI systems can access, directly constraining their potential for harm.
Structured Access / API-Only
Structured access provides AI capabilities through controlled APIs rather than releasing model weights, maintaining developer control over deployme...
Anthropic
An AI safety company founded by former OpenAI researchers that develops frontier AI models while pursuing safety research, including the Claude mod...
METR
Model Evaluation and Threat Research conducts dangerous capability evaluations for frontier AI models, testing for autonomous replication, cybersec...
Agentic AI
AI systems that autonomously take actions in the world to accomplish goals. Industry forecasts project 40% of enterprise applications will include ...