Representation Engineering

Interpretabilityemerging

Controlling AI behavior by directly manipulating internal representations, including activation addition and steering vectors.

Organizations

3

Key Papers

1

First Proposed: 2023 (Zou et al.)

Cluster: Interpretability

Parent Area: Interpretability

Key Papers & Resources1

SEMINAL

Representation Engineering: A Top-Down Approach to AI Transparency

Zou et al.2023

Tags

interpretabilityactivation-steeringcontrol