Skip to content
Longterm Wiki

Representation Engineering

Interpretabilityemerging
Controlling AI behavior by directly manipulating internal representations, including activation addition and steering vectors.
Organizations
3
Key Papers
1
First Proposed: 2023 (Zou et al.)
Cluster: Interpretability
Parent Area: Interpretability

Key Papers & Resources1

Tags

interpretabilityactivation-steeringcontrol