AI safety organization incubated by the Future of Life Foundation, led by Richard Mallah (former FLI Principal AI Safety Strategist since 2014). Focus areas include risk assessment, policy strategy, and technical safety. Based in Calabasas, CA. Fiscally sponsored by Social & Environmental Entrepreneurs, Inc. Described by Anthony Aguirre as "the one thing FLF has sort of fully launched." Team of 12+ people. Advisors include Anthony Aguirre and Eric Drexler.
Anthropic Core ViewsSafety AgendaAnthropic Core ViewsAnthropic allocates 15-25% of R&D (~$100-200M annually) to safety research including the world's largest interpretability team (40-60 researchers), while maintaining $5B+ revenue by 2025. Their RSP...Quality: 62/100Prosaic AlignmentSafety AgendaProsaic AlignmentAligning AI systems using current deep learning techniques without fundamental new paradigms0
Approaches
AI AlignmentApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metrics for existing systems but face critical scalabi...Quality: 91/100AI-Human Hybrid SystemsApproachAI-Human Hybrid SystemsHybrid AI-human systems achieve 15-40% error reduction across domains through six design patterns, with evidence from Meta (23% false positive reduction), Stanford Healthcare (27% diagnostic improv...Quality: 91/100
Analysis
Short AI Timeline Policy ImplicationsAnalysisShort AI Timeline Policy ImplicationsAnalyzes how AI policy priorities shift under 1-5 year timelines to transformative AI, arguing that interventions requiring less than 2 years (lab safety practices, compute monitoring, emergency co...Quality: 62/100Alignment Robustness Trajectory ModelAnalysisAlignment Robustness Trajectory ModelThis model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 10-30x where systems are dangerous but can't help s...Quality: 64/100
Historical
Anthropic-Pentagon Standoff (2026)EventAnthropic-Pentagon Standoff (2026)Comprehensive analysis of the February 2026 confrontation between Anthropic and the US government. Triggered when Claude AI was used in the January 2026 Venezuela raid via Palantir, Anthropic refus...Quality: 70/100
Other
David SacksPersonDavid SacksDavid Sacks, as White House AI and Crypto Czar, represents a significant policy force advocating for minimal AI regulation while dismissing AI safety concerns as 'fear-mongering' and regulatory cap...Quality: 65/100Value LearningResearch AreaValue LearningTraining AI systems to infer and adopt human values from observation and interactionQuality: 59/100RLHFResearch AreaRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100
Key Debates
Why Alignment Might Be HardArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100AI Alignment Research AgendasCruxAI Alignment Research AgendasComprehensive comparison of major AI safety research agendas ($100M+ Anthropic, $50M+ DeepMind, $5-10M nonprofits) with detailed funding, team sizes, and failure mode coverage (25-65% per agenda). ...Quality: 69/100Why Alignment Might Be EasyArgumentWhy Alignment Might Be EasySynthesizes empirical evidence that alignment is tractable, citing 29-41% RLHF improvements, Constitutional AI reducing bias across 9 dimensions, millions of interpretable features from Claude 3, a...Quality: 53/100
Risks
Epistemic SycophancyRiskEpistemic SycophancyAI sycophancy—where models agree with users rather than provide accurate information—affects all five state-of-the-art models tested, with medical AI showing 100% compliance with illogical requests...Quality: 60/100SchemingRiskSchemingScheming—strategic AI deception during training—has transitioned from theoretical concern to observed behavior across all major frontier models (o1: 37% alignment faking, Claude: 14% harmful compli...Quality: 74/100
Organizations
US AI Safety InstituteOrganizationUS AI Safety InstituteThe US AI Safety Institute (AISI), established November 2023 within NIST with $10M budget (FY2025 request $82.7M), conducted pre-deployment evaluations of frontier models through MOUs with OpenAI a...Quality: 91/100Safe Superintelligence Inc.OrganizationSafe Superintelligence Inc.Safe Superintelligence Inc represents a significant AI safety organization founded by key OpenAI alumni with $3B funding and a singular focus on developing safe superintelligence, though its actual...Quality: 45/100Cambridge Boston Alignment InitiativeOrganizationCambridge Boston Alignment InitiativeRegional AI alignment research and community organization based in the Cambridge/Boston area.
Concepts
Agentic AICapabilityAgentic AIAnalysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, $199B market by 2034) alongside implementation difficulties (40%+ proj...Quality: 68/100Self-Improvement and Recursive EnhancementCapabilitySelf-Improvement and Recursive EnhancementComprehensive analysis of AI self-improvement from current AutoML systems (23% training speedups via AlphaEvolve) to theoretical intelligence explosion scenarios, with expert consensus at ~50% prob...Quality: 69/100