Longterm Wiki

Refusal Training

Alignment Trainingactive

Safety-specific fine-tuning to train models to decline harmful or dangerous requests.

Cluster: Alignment Training

Tags

function:specificationstage:trainingscope:technique