Longterm Wiki

Toy Models for Interpretability

Interpretabilityactive

Small simplified model proxies that capture key deep learning dynamics for interpretability research.

Cluster: Interpretability

Tags

function:assurancescope:technique