Longterm Wiki

Capability Unlearning / Removal

Methods to remove specific dangerous capabilities from trained AI models, directly addressing misuse risks by eliminating harmful knowledge, though current techniques face challenges around verification, capability recovery, and general performance degradation.

Related

Related Pages

Top Related Pages

Analysis

AI Uplift Assessment ModelBioweapons Attack Chain ModelAI-Bioweapons Timeline Model

Approaches

Refusal TrainingDangerous Capability EvaluationsEliciting Latent Knowledge (ELK)

Key Debates

AI Misuse Risk Cruxes

Tags

unlearningcapability-removalmisuse-preventionmodel-editingbioweapons