Back
Intro to ML Safety Course
webcourse.mlsafety.org·course.mlsafety.org/
Created by the Center for AI Safety, this is one of the most comprehensive structured curricula for technical AI safety, suitable for ML practitioners seeking a systematic introduction to safety concepts and methods.
Metadata
Importance: 82/100documentationeducational
Summary
A structured university-level course on machine learning safety developed by the Center for AI Safety, covering topics from robustness and anomaly detection to alignment and systemic safety. The course includes lecture recordings, slides, notes, and coding assignments across modules on safety engineering, robustness, monitoring, alignment, and emerging risks.
Key Points
- •Covers safety engineering fundamentals: risk decomposition, accident models (Swiss Cheese, STAMP), and black swan risks
- •Technical modules on adversarial robustness, anomaly detection, and interpretable uncertainty with hands-on coding assignments
- •Includes alignment-focused content on goal misgeneralization, scalable oversight, and emergent capabilities
- •Draws on established safety engineering frameworks from other high-stakes industries and applies them to ML systems
- •Freely accessible with full lecture recordings, slides, written notes, and Colab notebooks for self-study or structured learning
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Center for AI Safety (CAIS) | Organization | 42.0 |
| Dan Hendrycks | Person | 19.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 20263 KB
Syllabus | Intro to ML Safety Link Search Menu Expand Document Intro to ML Safety Express interest in the next semester of Intro to ML Safety .
Syllabus
Legend: 🎥 lecture recording, 🖥️ slides, 📖 notes, 📝 written questions, ⌨️ coding assignment.
Background
1 Introduction 🎥 , 🖥️️ 2 Optional Deep Learning Review 🎥 , 🖥️ , 📖 , 📝 , ⌨️ building blocks, optimizers, losses, datasets Safety Engineering
3 Risk Decomposition 🎥 , 🖥️️ risk analysis definitions, disaster risk equation, decomposition of safety areas, ability to cope and existential risk 4 Accident Models 🎥 , 🖥️ FMEA, Bow Tie model, Swiss Cheese model, defense in depth, preventative and protective measures, complex systems, nonlinear causality, emergence, STAMP 5 Black Swans 🎥 , 🖥️ unknown unknowns, long tailed distributions, multiplicative processes, extremistan ► Review questions 📝 Robustness
6 Adversarial Robustness 🎥 , 🖥️ , 📖 , ⌨️ optimization pressure, PGD, untargeted vs targeted attacks, adversarial evaluation, white box vs black box, transferability, unforeseen attacks, text attacks, robustness certificates 7 Black Swan Robustness 🎥 , 🖥️️ , 📖 stress tests, train-test mismatch, adversarial distribution shifts, simulated scenarios for robustness 8 Review questions 📝 Monitoring
8 Anomaly Detection 🎥 , 🖥️️ , 📖 , ⌨️ AUROC/AUPR/FPR95, likelihoods and detection, MSP baseline, OE, ViM, anomaly datasets, one-class learning, detecting adversaries, error detection 9 Interpretable Uncertainty 🎥 , 🖥️ , 📖 calibration vs sharpness, proper scoring rules, Brier score, RMS calibration error, reliability diagrams, confidence intervals, quantile prediction 10 Transparency 🎥 , 🖥️ saliency maps, token heatmaps, feature visualizations, ProtoPNet 11 Trojans 🎥 , 🖥️ , 📖 , ⌨️ hidden functionality from poisoning, treacherous turns 12 Detecting Emergent Behavior 🎥 , 🖥️ , 📖 emergent capabilities, instrumental convergence, Goodhart’s law, proxy gaming 13 Review questions 📝 Control
13 Honest Models 🎥 , 🖥️ truthful vs. honest, inverse scaling, instances of model dishonesty 14 Power Aversion 🖥️ measuring power; the power-seeeking argument; power penalties 15 Machine Ethics 🎥 , 🖥️ , ⌨️ normative ethics background, human values, value learning with comparisons, translating moral knowledge into action, moral parliament, value clarification Systemic Safety
16 ML for Improved Decision-Making 🎥 , 🖥️ , 📖 forecasting, brainstorming 17 ML for Cyberdefense 🎥 , 🖥️ intrusion detection, detecting malicious programs, automated patching, fuzzing 18 Cooperative AI 🎥 , 🖥️ , 📖 nash equilibria, dominant strategies, stag hunt, Pareto improvements, cooperation mechanisms, morality as cooperation, cooperative dispositions, collusion externalities Additional Existential Risk Discussion
19 X-Risk Overview 🎥 , 🖥️ arguments for x-risk 20 Possible Existential Hazards 🎥 , 🖥️ weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, persuasive AI 21 Sa
... (truncated, 3 KB total)Resource ID:
65c9fe2d57a4eb4c | Stable ID: sid_znd5HyEClL