Skip to content
Longterm Wiki
Search
Entities
Research
Policy
Sources
FactBase
About
Internal
Search
⌘K
Research Areas
/
Interpretability
/
Representation Engineering
Representation Engineering
Interpretability
emerging
Data
Controlling AI behavior by directly manipulating internal representations, including activation addition and steering vectors.
Organizations
3
Key Papers
1
First Proposed:
2023 (Zou et al.)
Cluster:
Interpretability
Parent Area:
Interpretability
Key Papers & Resources
1
SEMINAL
Representation Engineering: A Top-Down Approach to AI Transparency
Zou et al.
2023
Tags
interpretability
activation-steering
control