Finding Feature Representations
InterpretabilityemergingResearch beyond SAEs into alternative methods for identifying latent features in model activations.
Cluster: Interpretability
Parent Area: Mechanistic Interpretability
Tags
function:assurancescope:technique