Anthropic Core ViewsSafety AgendaAnthropic Core ViewsAnthropic allocates 15-25% of R&D (~$100-200M annually) to safety research including the world's largest interpretability team (40-60 researchers), while maintaining $5B+ revenue by 2025. Their RSP...Quality: 62/100
Risks
Deceptive AlignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100AI-Induced IrreversibilityRiskAI-Induced IrreversibilityComprehensive analysis of irreversibility in AI development, distinguishing between decisive catastrophic events and accumulative risks through gradual lock-in. Quantifies current trends (60-70% al...Quality: 64/100AI Capability SandbaggingRiskAI Capability SandbaggingSystematically documents sandbagging (strategic underperformance during evaluations) across frontier models, finding 70-85% detection accuracy with white-box probes, 18-24% accuracy drops on autono...Quality: 67/100Emergent CapabilitiesRiskEmergent CapabilitiesEmergent capabilities—abilities appearing suddenly at scale without explicit training—pose high unpredictability risks. Wei et al. documented 137 emergent abilities; recent models show step-functio...Quality: 61/100Automation Bias (AI Systems)RiskAutomation Bias (AI Systems)Comprehensive review of automation bias showing physician accuracy drops from 92.8% to 23.6% with incorrect AI guidance, 78% of users accept AI outputs without scrutiny, and LLM hallucination rates...Quality: 56/100
Concepts
Agentic AICapabilityAgentic AIAnalysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, $199B market by 2034) alongside implementation difficulties (40%+ proj...Quality: 68/100Self-Improvement and Recursive EnhancementCapabilitySelf-Improvement and Recursive EnhancementComprehensive analysis of AI self-improvement from current AutoML systems (23% training speedups via AlphaEvolve) to theoretical intelligence explosion scenarios, with expert consensus at ~50% prob...Quality: 69/100
Organizations
Center for AI SafetyOrganizationCenter for AI SafetyCAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering ...Quality: 42/100Astralis FoundationOrganizationAstralis FoundationAstralis Foundation is a Swedish philanthropic organization focused on AI safety and governance.Quality: 30/100
Other
Geoffrey HintonPersonGeoffrey HintonComprehensive biographical profile of Geoffrey Hinton documenting his 2023 shift from AI pioneer to safety advocate, estimating 10-20% extinction risk in 5-20 years. Covers his media strategy, poli...Quality: 42/100Toby OrdPersonToby OrdComprehensive biographical profile of Toby Ord documenting his 10% AI extinction estimate and role founding effective altruism, with detailed tables on risk assessments, academic background, and in...Quality: 41/100Yoshua BengioPersonYoshua BengioComprehensive biographical overview of Yoshua Bengio's transition from deep learning pioneer (Turing Award 2018) to AI safety advocate, documenting his 2020 pivot at Mila toward safety research, co...Quality: 39/100Chris OlahPersonChris OlahComprehensive biographical profile of Chris Olah covering his unconventional career path, foundational contributions to mechanistic interpretability (feature visualization, circuit analysis, sparse...Quality: 27/100Neel NandaPersonNeel NandaComprehensive biographical profile of Neel Nanda covering his role as DeepMind's mechanistic interpretability team lead, key contributions (TransformerLens, Gemma Scope, grokking paper), and his ev...Quality: 26/100Ilya SutskeverPersonIlya SutskeverBiographical overview of Ilya Sutskever's career trajectory from deep learning researcher (AlexNet, seq2seq, dropout) to co-founding Safe Superintelligence Inc. in 2024 after leaving OpenAI. Docume...Quality: 26/100