Toy Models for Interpretability
InterpretabilityactiveSmall simplified model proxies that capture key deep learning dynamics for interpretability research.
Cluster: Interpretability
Parent Area: Mechanistic Interpretability
Tags
function:assurancescope:technique