Longterm Wiki
All Publications

Anthropic

Company BlogHigh(4)

AI safety company, Claude developer

Credibility Rating

4/5
High(4)
High quality. Established institution or organization with editorial oversight and accountability.
81
Resources
130
Citing pages
1
Tracked domains

Tracked Domains

anthropic.com

Resources (81)

Citing Pages (130)

AI Accident Risk CruxesAgentic AIAGI DevelopmentAGI TimelineAI-Assisted AlignmentAI ControlAI AlignmentAlignment EvaluationsAlignment Robustness Trajectory ModelAnthropicAnthropic Core ViewsAnthropic IPOAnthropic Pre-IPO DAF TransfersAnthropic StakeholdersApollo ResearchAlignment Research CenterAutonomous Weapons Escalation ModelBioweapons RiskAI Uplift Assessment ModelBioweapons Attack Chain ModelCenter for AI SafetyCapabilities-to-Safety Pipeline ModelCapability ElicitationAI Capability Threshold ModelCarlsmith's Six-Premise ArgumentThe Case For AI Existential RiskAutonomous CodingAI Compounding Risks Analysis ModelAI-Driven Concentration of PowerConstitutional AICorrigibilityCorrigibility FailureCorrigibility Failure PathwaysCyberweapons RiskAutonomous Cyber Attack TimelineDaniela AmodeiDario AmodeiAI Safety via DebateDeceptive AlignmentDeceptive Alignment Decomposition ModelAI Safety Defense in Depth ModelAI-Assisted DeliberationAI DisinformationAI Policy EffectivenessAI-Induced EnfeeblementEpistemic SycophancyEU AI ActEval Saturation & The Evals GapAI EvaluationEvan HubingerGoal MisgeneralizationGoal Misgeneralization Probability ModelGoal Misgeneralization ResearchAI Governance and PolicyHeavy Scaffolding / Agentic SystemsHolden KarnofskyAI-Human Hybrid SystemsInstrumental ConvergenceInstrumental Convergence FrameworkInterpretabilityAI Safety Intervention Effectiveness MatrixAI Knowledge MonopolyAI Lab Safety CultureLarge Language ModelsLarge Language ModelsAI Value Lock-inLong-Horizon Autonomous TasksMechanistic InterpretabilityMesa-OptimizationMesa-Optimization Risk AnalysisMetaculusMachine Intelligence Research InstituteAI Misuse Risk CruxesAI Model SpecificationsMultipolar Trap Dynamics ModelOpen Source AI SafetyOpenAIOptimistic Alignment WorldviewPaul ChristianoPause AdvocacyShould We Pause AI Development?Persuasion and Social ManipulationPower-Seeking AIPower-Seeking Emergence Conditions ModelPre-TAI Capital Deployment: $100B-$300B+ Spending AnalysisProbing / Linear ProbesAI ProliferationAI Development Racing DynamicsRacing Dynamics Impact ModelReasoning and PlanningRed TeamingRedwood ResearchResponsible Scaling Policies (RSPs)Reward HackingReward Hacking Taxonomy and Severity ModelAI Risk Activation Timeline ModelAI Risk Cascade Pathways ModelAI Risk Interaction MatrixAI Risk Interaction Network ModelRLHFResponsible Scaling PoliciesAI Safety Culture Equilibrium ModelAI Safety Research Allocation ModelAI Safety Research Value ModelAI Safety Researcher Gap ModelAI Capability SandbaggingSandboxing / ContainmentScalable OversightAI Scaling LawsSchemingScheming & Deception DetectionScheming Likelihood AssessmentSurvival and Flourishing FundSharp Left TurnSituational AwarenessSleeper Agent DetectionAI Safety Solution CruxesState-Space Models / MambaAI Model SteganographyStructured Access / API-OnlySycophancySycophancy Feedback Loop ModelAI Safety Technical Pathway DecompositionTechnical AI Safety ResearchTreacherous TurnVoluntary AI Safety CommitmentsAI Risk Warning Signs ModelWhy Alignment Might Be EasyWhy Alignment Might Be HardWorldview-Intervention Mapping
Publication ID: anthropic
Anthropic | Publications | Longterm Wiki | Longterm Wiki