[PDF] Safeguarded AI: constructing guaranteed safety - ARIA

web

aria.org.uk·aria.org.uk/media/3nhijno4/aria-safeguarded-ai-programme-...

Metadata

Cited by 2 pages

Page	Type	Quality
Provable / Guaranteed Safe AI	Concept	64.0
Formal Verification (AI Safety)	Approach	65.0

Cached Content Preview

HTTP 200Fetched Apr 30, 202673 KB

# SafeguardedAI: constructing guaranteed safety

## Programme thesis

v1.2 David “davidad” Dalrymple, Programme Director

## CONTEXT

An ARIA programme seeks to unlock a scientific or This document presents the core thesis technical capability that underpinning a programme that has now launched. + changes the perception of what’s possible or valuable + has the potential to catalyse massive Sign up here to receive all updates social and economic returns about this opportunity space and see the programme here. + is unlikely to be achieved without ARIA’s intervention

## UPDATE:OURTHINKING,EVOLVED

_A summary capturing the evolution of our thinking since publication._

Since publishing v1.0 of this thesis in February 2024, we have invited public feedback on our ideas and engaged with experts to challenge and refine our thinking. The following key learnings have emerged from that process so far, and have been incorporated into the programme:

- Understanding the needs of application areas is critical early. In response to feedback about de-risking applicability, we have chosen to stage the TA3 Applications Phase 1 solicitation second (immediately following TA1.1 Theory) instead of last, so that TA3 creators can inform use-inspired assessments and requirements of other TAs sooner rather than later. This is reflected in the updated thesis below (in the “How we expect to fund” section on page 14). + We have updated the structure of TA2. Since we expect to fund TA2 work as one award to a single major R&D effort, we have removed the formal subareas within TA2, and instead highlight four core technical objectives. Compared to the earlier version, we have added a technical objective for developing AI systems which estimate conditional probabilities with respect to a world model. We realised that adding this element is required to capture the full scope of approaches we are interested in exploring in TA2, which is now broadened beyond sound bounds, to include estimates with merely asymptotic correctness guarantees if they incorporate adversarial training. In the framework of \[12, fig. 4, sec. 3.4\], this corresponds to verification levels V7–V9. + We moved the work on sociotechnical integration from TA2 into TA1, as TA1.4. We believe this better captures the role we see this work playing. In particular: + We see this work as an important part of the necessary scaffolding for successful applications of Safeguarded AI, which is the overarching goal of TA1. + We expect to fund multiple academic groups with both overlapping and complementary strengths to investigate these questions through open-access work, more like TA1.1 than like the single effort of TA2. + We have developed our thinking on potential international cooperation to help realise the full potential of Safeguarded AI workflows. This might including joint workshops, joint working groups, coordinated funding, co-funding, and information-sharing arrangements. We share more details on our thinking in 

... (truncated, 73 KB total)

Resource ID: cab5bbb6f50385c5 | Stable ID: sid_bcr4rxSX7Q