Representation Engineering
InterpretabilityactiveIntervening on model representations to steer behavior (e.g., activation addition, representation reading).
Key Papers
1
Tags
function:assurancescope:technique
Key Papers & Resources1
SEMINAL
Intervening on model representations to steer behavior (e.g., activation addition, representation reading).