Finding Feature Representations

Interpretabilityemerging

Research beyond SAEs into alternative methods for identifying latent features in model activations.

Organizations

Key Papers

Cluster: Interpretability