Linear Probing
InterpretabilityactiveLightweight interpretability using linear classifiers on model activations to detect features.
Cluster: Interpretability
Parent Area: Interpretability
Tags
function:assurancescope:technique
Lightweight interpretability using linear classifiers on model activations to detect features.