Longterm Wiki

Activation Monitoring

Interpretabilityemerging

Using probes on internal activations during inference to catch misaligned actions in real-time.

Cluster: Interpretability
Parent Area: Interpretability

Tags

function:assurancestage:inferencescope:technique