Longterm Wiki

Provably Safe AI (davidad agenda)

An ambitious research agenda to design AI systems with mathematical safety guarantees from the ground up, led by ARIA's 59M pound Safeguarded AI programme with the goal of creating superintelligent systems that are provably beneficial through formal verification of world models and value specifications.

Related

Related Pages

Top Related Pages

Risks

Deceptive AlignmentGoal MisgeneralizationPower-Seeking AI

Approaches

AI Safety CasesSleeper Agent Detection

Other

Max TegmarkYoshua BengioStuart Russell

Concepts

Alignment Theoretical OverviewProvable / Guaranteed Safe AI

Tags

formal-methodsmathematical-guaranteesworld-modelingvalue-specificationaria-programmelong-term-research