Longterm Wiki

Linear Probing

Interpretabilityactive

Lightweight interpretability using linear classifiers on model activations to detect features.

Cluster: Interpretability
Parent Area: Interpretability

Tags

function:assurancescope:technique