Robust Unlearning
Alignment TrainingemergingRemoving dangerous knowledge from model weights in a way that resists relearning.
Cluster: Alignment Training
Tags
function:specificationstage:trainingscope:technique
Removing dangerous knowledge from model weights in a way that resists relearning.