Anthropic Core ViewsSafety AgendaAnthropic Core ViewsAnthropic allocates 15-25% of R&D (~$100-200M annually) to safety research including the world's largest interpretability team (40-60 researchers), while maintaining $5B+ revenue by 2025. Their RSP...Quality: 62/100
Risks
SchemingRiskSchemingScheming—strategic AI deception during training—has transitioned from theoretical concern to observed behavior across all major frontier models (o1: 37% alignment faking, Claude: 14% harmful compli...Quality: 74/100Deceptive AlignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100AI-Induced IrreversibilityRiskAI-Induced IrreversibilityComprehensive analysis of irreversibility in AI development, distinguishing between decisive catastrophic events and accumulative risks through gradual lock-in. Quantifies current trends (60-70% al...Quality: 64/100AI Capability SandbaggingRiskAI Capability SandbaggingSystematically documents sandbagging (strategic underperformance during evaluations) across frontier models, finding 70-85% detection accuracy with white-box probes, 18-24% accuracy drops on autono...Quality: 67/100Emergent CapabilitiesRiskEmergent CapabilitiesEmergent capabilities—abilities appearing suddenly at scale without explicit training—pose high unpredictability risks. Wei et al. documented 137 emergent abilities; recent models show step-functio...Quality: 61/100Automation Bias (AI Systems)RiskAutomation Bias (AI Systems)Comprehensive review of automation bias showing physician accuracy drops from 92.8% to 23.6% with incorrect AI guidance, 78% of users accept AI outputs without scrutiny, and LLM hallucination rates...Quality: 56/100
Organizations
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT structu...Quality: 74/100Stanford UniversityOrganizationStanford UniversityPrivate research university in Stanford, California. Home to the Stanford Institute for Human-Centered Artificial Intelligence (HAI) and the Center for International Security and Cooperation (CISAC).SaferAIOrganizationSaferAIOrganization focused on AI safety standards and evaluation frameworks.London Initiative for Safe AIOrganizationLondon Initiative for Safe AILondon-based AI safety initiative focused on policy engagement.BlueDot ImpactOrganizationBlueDot ImpactOrganization running AI safety and governance courses. Known for its AI Safety Fundamentals and AI Governance courses.Encode JusticeOrganizationEncode JusticeYouth-led AI accountability organization. Co-source of California SB 1047. Supported California SB 1047.
Other
Oliver SourbutPersonOliver SourbutResearcher and AI specialist at the Future of Life Foundation. Former UK AI Safety Institute researcher. Engaged with OECD, UK FCDO, and DSIT on AI governance. Researched agent oversight at Oxford....Emilia JavorskyPersonEmilia JavorskyDirector of the Futures Program at the Future of Life Institute (FLI). A scientist and inventor with extensive experience in biomedical research and technology development. Leads FLI's work on guid...Seán Ó hÉigeartaighPersonSeán Ó hÉigeartaighFounding Executive Director of the Centre for the Study of Existential Risk (CSER) at the University of Cambridge, leading CSER from its establishment in 2013 through 2020. Led the development of C...
Concepts
Agentic AICapabilityAgentic AIAnalysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, $199B market by 2034) alongside implementation difficulties (40%+ proj...Quality: 68/100Self-Improvement and Recursive EnhancementCapabilitySelf-Improvement and Recursive EnhancementComprehensive analysis of AI self-improvement from current AutoML systems (23% training speedups via AlphaEvolve) to theoretical intelligence explosion scenarios, with expert consensus at ~50% prob...Quality: 69/100