Longterm Wiki

Circuit Breakers

AI Controlactive

Inference-time interventions that halt model execution when unsafe behavior is detected.

First Proposed: 2024 (Zou et al.)
Cluster: AI Control
Parent Area: AI Control

Tags

function:robustnessstage:inferencescope:technique