AI Control
AI ControlactiveResearch on deploying AI systems with sufficient safeguards even if they are misaligned.
Organizations
2
Key Papers
1
Risks Addressed
2
First Proposed: 2024 (Greenblatt & Roger, Redwood)
Cluster: AI Control
Tags
function:robustnessscope:field
Organizations2
| Organization | Role |
|---|---|
| Anthropic | active |
| Redwood Research | pioneer |
Key Papers & Resources1
SEMINAL
AI Control: Improving Safety Despite Intentional Subversion
Greenblatt & Roger2024
Sub-Areas8
| Name | Status | Orgs | Papers |
|---|---|---|---|
| Circuit BreakersInference-time interventions that halt model execution when unsafe behavior is detected. | active | 0 | 0 |
| Encoded Reasoning / Steganography DetectionDetecting hidden meaning or covert communication in AI-generated chains of thought. | emerging | 0 | 0 |
| Monitoring & Anomaly DetectionRuntime behavioral monitoring of deployed AI systems to catch unexpected behavior. | active | 0 | 0 |
| Multi-Agent SafetySafety challenges and solutions for systems of multiple interacting AI agents. | active | 0 | 0 |
| Output FilteringPost-generation safety filters that screen model outputs before delivery. | active | 0 | 0 |
| Sandboxing / ContainmentPhysical and logical isolation of AI systems to limit potential harm. | active | 0 | 0 |
| Structured Access / API-OnlyDeployment models that restrict access to model weights, providing only API interfaces. | active | 0 | 0 |
| Tool-Use RestrictionsLimiting which external tools and actions AI agents can access. | active | 0 | 0 |