AI safety research organization focused on cognitive emulation and mechanistic interpretability, pursuing interpretability-first approaches to buil...

Person

Eliezer Yudkowsky

Co-founder of MIRI, early AI safety researcher and rationalist community founder

Safety Agenda

Anthropic Core Views

Anthropic's Core Views on AI Safety (2023) articulates the thesis that meaningful safety research requires frontier access.

Approach

AI Alignment

Technical approaches to ensuring AI systems pursue intended goals and remain aligned with human values throughout training and deployment. Current ...

Argument

Why Alignment Might Be Hard

AI alignment faces fundamental challenges: specification problems (value complexity, Goodhart's Law), inner alignment failures (mesa-optimization, ...

Prosaic Alignment