Skip to content
Longterm Wiki
Search
Entities
Research
Policy
Sources
FactBase
About
Internal
Search
⌘K
Research Areas
/
Interpretability
/
Activation Monitoring
Activation Monitoring
Interpretability
emerging
Data
Using probes and monitors on model internals to detect deception, harmful intent, or anomalous reasoning in real time.
Organizations
3
Cluster:
Interpretability
Parent Area:
Interpretability
Tags
interpretability
monitoring
probes