Nonprofit organization investigating offensive AI capabilities and controllability of frontier AI models through empirical research on autonomous hacking, shutdown resistance, and agentic misalignment.
Related Wiki Pages
Top Related Pages
AI Alignment
Technical approaches to ensuring AI systems pursue intended goals and remain aligned with human values throughout training and deployment. Current ...
Anthropic
An AI safety company founded by former OpenAI researchers that develops frontier AI models while pursuing safety research, including the Claude mod...
Survival and Flourishing Fund (SFF)
SFF is a donor-advised fund financed primarily by Jaan Tallinn (Skype co-founder, ~\$900M net worth) that uses a unique S-process simulation mechan...
Dario Amodei
CEO of Anthropic advocating competitive safety development philosophy with Constitutional AI, responsible scaling policies, and empirical alignment...
Yoshua Bengio
Turing Award winner and deep learning pioneer who became a prominent AI safety advocate, co-founding safety research initiatives at Mila and co-sig...