Activation Monitoring
InterpretabilityemergingUsing probes on internal activations during inference to catch misaligned actions in real-time.
Cluster: Interpretability
Parent Area: Interpretability
Tags
function:assurancestage:inferencescope:technique