Longterm Wiki

AI Control

AI Controlactive

Research on deploying AI systems with sufficient safeguards even if they are misaligned.

Organizations
2
Key Papers
1
Risks Addressed
2
First Proposed: 2024 (Greenblatt & Roger, Redwood)
Cluster: AI Control

Tags

function:robustnessscope:field

Organizations2

OrganizationRole
Anthropicactive
Redwood Researchpioneer

Key Papers & Resources1

Sub-Areas8

NameStatusOrgsPapers
Circuit BreakersInference-time interventions that halt model execution when unsafe behavior is detected.active00
Encoded Reasoning / Steganography DetectionDetecting hidden meaning or covert communication in AI-generated chains of thought.emerging00
Monitoring & Anomaly DetectionRuntime behavioral monitoring of deployed AI systems to catch unexpected behavior.active00
Multi-Agent SafetySafety challenges and solutions for systems of multiple interacting AI agents.active00
Output FilteringPost-generation safety filters that screen model outputs before delivery.active00
Sandboxing / ContainmentPhysical and logical isolation of AI systems to limit potential harm.active00
Structured Access / API-OnlyDeployment models that restrict access to model weights, providing only API interfaces.active00
Tool-Use RestrictionsLimiting which external tools and actions AI agents can access.active00