Safety Approaches Table
Columns:|
Current research investment | Safety vs capability progress ratio | Recommended funding change | How much does this reduce catastrophic risk? | Does it make AI more capable? | Is the world safer with this? | Does it work as AI gets smarter? | Does it work against deceptive AI? | Works for superintelligent AI? | Current adoption level | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
Training & Alignment | $1B+/yr | CAPABILITY-DOMINANT | REDUCE | LOW-MEDIUM | DOMINANT | UNCLEAR | BREAKS | NONE | NO | UNIVERSAL | |
Training & Alignment | $50-200M/yr | CAPABILITY-LEANING | MAINTAIN | MEDIUM | SIGNIFICANT | UNCLEAR | PARTIAL | WEAK | UNLIKELY | WIDESPREAD | |
Training & Alignment | $5-20M/yr | SAFETY-LEANING | INCREASE | UNKNOWN | SOME | UNCLEAR | MAYBE | PARTIAL | MAYBE | EXPERIMENTAL | |
Training & Alignment | $100-500M/yr | BALANCED | MAINTAIN | MEDIUM | SIGNIFICANT | HELPFUL | PARTIAL | PARTIAL | UNLIKELY | WIDESPREAD | |
Training & Alignment | $10-50M/yr | SAFETY-LEANING | INCREASE | UNKNOWN | SOME | UNCLEAR | UNKNOWN | UNKNOWN | MAYBE | EXPERIMENTAL | |
Training & Alignment | $500M+/yr | CAPABILITY-DOMINANT | REDUCE | LOW | SIGNIFICANT | UNCLEAR | PARTIAL | NONE | NO | UNIVERSAL | |
Training & Alignment | $1-5M/yr | SAFETY-DOMINANT | INCREASE | MEDIUM | NEUTRAL | HELPFUL | UNKNOWN | PARTIAL | MAYBE | NONE | |
Training & Alignment | $10-30M/yr | SAFETY-LEANING | INCREASE | MEDIUM | SOME | HELPFUL | PARTIAL | WEAK | UNLIKELY | WIDESPREAD | |
Training & Alignment | $50-150M/yr | BALANCED | MAINTAIN | LOW-MEDIUM | SOME | HELPFUL | PARTIAL | NONE | NO | UNIVERSAL | |
Training & Alignment | $5-20M/yr | SAFETY-LEANING | INCREASE | MEDIUM | SOME | HELPFUL | UNKNOWN | PARTIAL | MAYBE | EXPERIMENTAL | |
Interpretability | $50-150M/yr | SAFETY-DOMINANT | PRIORITIZE | LOW (now) / HIGH (potential) | NEUTRAL | HELPFUL | UNKNOWN | STRONG (if works) | MAYBE | EXPERIMENTAL | |
Interpretability | $10-30M/yr | SAFETY-DOMINANT | INCREASE | LOW (now) | NEUTRAL | HELPFUL | PARTIAL | PARTIAL | UNKNOWN | EXPERIMENTAL | |
Interpretability | $5-20M/yr | SAFETY-LEANING | INCREASE | MEDIUM | SOME | HELPFUL | PARTIAL | PARTIAL | UNKNOWN | EXPERIMENTAL | |
Interpretability | $5-10M/yr | SAFETY-DOMINANT | MAINTAIN | LOW | NEUTRAL | HELPFUL | YES | PARTIAL | MAYBE | WIDESPREAD | |
Evaluation | $20-50M/yr | SAFETY-DOMINANT | INCREASE | MEDIUM | NEUTRAL | HELPFUL | PARTIAL | WEAK | UNLIKELY | WIDESPREAD | |
Evaluation | $50-200M/yr | BALANCED | MAINTAIN | LOW-MEDIUM | NEUTRAL | HELPFUL | PARTIAL | NONE | NO | UNIVERSAL | |
Evaluation | $10-30M/yr | SAFETY-DOMINANT | PRIORITIZE | MEDIUM | NEUTRAL | HELPFUL | UNKNOWN | WEAK | UNLIKELY | SOME | |
Evaluation | $10-30M/yr | SAFETY-DOMINANT | INCREASE | LOW-MEDIUM | NEUTRAL | HELPFUL | PARTIAL | WEAK | UNLIKELY | SOME | |
Evaluation | $5-15M/yr | SAFETY-DOMINANT | PRIORITIZE | MEDIUM-HIGH | TAX | HELPFUL | PARTIAL | PARTIAL | UNLIKELY | EXPERIMENTAL | |
Evaluation | $10-30M/yr | SAFETY-LEANING | INCREASE | MEDIUM | SOME | HELPFUL | PARTIAL | WEAK | NO | SOME | |
Evaluation | $5-15M/yr | SAFETY-DOMINANT | PRIORITIZE | HIGH (if works) | NEUTRAL | HELPFUL | UNKNOWN | UNKNOWN | UNKNOWN | EXPERIMENTAL | |
Architectural | $50-200M/yr | BALANCED | MAINTAIN | LOW | TAX | NEUTRAL | BREAKS | NONE | NO | UNIVERSAL | |
Architectural | (included in RLHF) | BALANCED | MAINTAIN | LOW-MEDIUM | TAX | NEUTRAL | BREAKS | NONE | NO | UNIVERSAL | |
Architectural | $20-50M/yr | SAFETY-LEANING | INCREASE | MEDIUM | TAX | HELPFUL | PARTIAL | PARTIAL | UNLIKELY | SOME | |
Architectural | $10-30M/yr | SAFETY-DOMINANT | INCREASE | MEDIUM | TAX | HELPFUL | PARTIAL | PARTIAL | PARTIAL | WIDESPREAD | |
Architectural | $10-30M/yr | SAFETY-DOMINANT | INCREASE | MEDIUM | NEUTRAL | HELPFUL | PARTIAL | WEAK | NO | SOME | |
Architectural | $10-30M/yr | SAFETY-LEANING | INCREASE | MEDIUM | TAX | HELPFUL | PARTIAL | WEAK | NO | SOME | |
Architectural | $20-50M/yr | SAFETY-LEANING | MAINTAIN | MEDIUM-HIGH | TAX | HELPFUL | YES | N/A | PARTIAL | WIDESPREAD | |
Governance | $5-20M/yr | SAFETY-DOMINANT | PRIORITIZE | MEDIUM-HIGH | NEGATIVE | HELPFUL | YES | N/A | PARTIAL | SOME | |
Governance | $5-15M/yr | SAFETY-DOMINANT | INCREASE | MEDIUM | NEUTRAL | HELPFUL | UNKNOWN | PARTIAL | UNLIKELY | SOME | |
Governance | $10-30M/yr | SAFETY-DOMINANT | INCREASE | MEDIUM | TAX | HELPFUL | PARTIAL | WEAK | NO | SOME | |
Governance | $5-15M/yr | SAFETY-DOMINANT | INCREASE | LOW-MEDIUM | TAX | HELPFUL | YES | N/A | PARTIAL | EXPERIMENTAL | |
Governance | $1-5M/yr | SAFETY-DOMINANT | MAINTAIN | HIGH (if implemented) | NEGATIVE | UNCLEAR | UNKNOWN | N/A | YES (if works) | NONE | |
Governance | $10-30M/yr | SAFETY-DOMINANT | PRIORITIZE | MEDIUM-HIGH | TAX | HELPFUL | PARTIAL | N/A | PARTIAL | EXPERIMENTAL | |
Theoretical | $5-20M/yr | SAFETY-DOMINANT | INCREASE | HIGH (if achievable) | TAX | HELPFUL | UNKNOWN | STRONG (if works) | MAYBE | NONE | |
Theoretical | $10-50M/yr | SAFETY-DOMINANT | INCREASE | CRITICAL (if works) | TAX | HELPFUL | UNKNOWN | STRONG (by design) | YES (if works) | NONE | |
Theoretical | $1-5M/yr | SAFETY-DOMINANT | PRIORITIZE | HIGH (if solved) | NEUTRAL | HELPFUL | UNKNOWN | PARTIAL | MAYBE | NONE | |
Theoretical | $5-20M/yr | BALANCED | INCREASE | MEDIUM | SOME | HELPFUL | PARTIAL | N/A | UNKNOWN | EXPERIMENTAL | |
Theoretical | $5-15M/yr | SAFETY-LEANING | PRIORITIZE | HIGH (if solved) | SOME | HELPFUL | UNKNOWN | STRONG (if solved) | MAYBE | NONE | |
Theoretical | $5-20M/yr | SAFETY-DOMINANT | INCREASE | HIGH (if works) | NEGATIVE | HELPFUL | UNKNOWN | WEAK | UNLIKELY | EXPERIMENTAL | |
Theoretical | $10-30M/yr | SAFETY-DOMINANT | PRIORITIZE | HIGH | TAX | HELPFUL | UNKNOWN | PARTIAL | CRITICAL QUESTION | SOME |