Longterm Wiki

Finding Feature Representations

Interpretabilityemerging

Research beyond SAEs into alternative methods for identifying latent features in model activations.

Cluster: Interpretability

Tags

function:assurancescope:technique