FLI 2018 AGI Safety Grant Program Archive

web

Future of Life Institute·futureoflife.org/grant-program/2018-grant-program/

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Future of Life Institute

This page archives FLI's 2018 AGI Safety Grant Program, documenting funded projects in technical AI safety and governance, providing a snapshot of the field's research priorities at that time.

Metadata

Importance: 52/100organizational reportreference

Summary

The Future of Life Institute's 2018 grant program funded AI safety research across technical and governance domains. Projects included ad hoc teamwork with moral feedback for safe robots, iterated distillation and amplification for scalable alignment, and academic governance frameworks for AGI. The archive documents grant recipients, amounts, and technical abstracts.

Key Points

•Peter Stone (UT Austin) received $200K to develop M-TAMER, teaching agents human morality via feedback in hybrid human-agent teams.
•Owain Evans (Oxford) received $225K to empirically study 'factored cognition' as a foundation for Christiano's Iterated Distillation and Amplification (IDA).
•Allan Dafoe (Yale) received $276K for a Governance of AI Programme addressing institutional design for safe AGI development.
•The program reflects early FLI investment in both technical alignment (scalable oversight, value learning) and AI governance research.
•Median AI researcher surveyed believed at least 5% chance of extinction-level negative outcome from AGI, motivating governance work.

Cached Content Preview

HTTP 200Fetched Apr 11, 202615 KB

All Grant Programs 2018 AGI Safety Grant Program 

 FLI's second grant program focused on AI safety. Status: Completed 

 Grants archive

 An archive of all grants provided within this grant program: Project title Ad hoc Teamwork and Moral Feedback as a Framework for Safe Robot Behavior 

 Amount recommended $200,000.00 Primary investigator Peter Stone , University of Texas Details Technical Abstract

 As technology develops, it is only a matter of time before agents will be capable of long term (general purpose) autonomy, i.e., will need to choose their actions by themselves for a long period of time. Thus, in many cases agents will not be able to be coordinated in advance with all other agents with which they may interact. Instead, agents will need to cooperate in order to accomplish unanticipated joint goals without pre-coordination. As a result, the ``ad hoc teamwork'' problem, in which teammates must work together to obtain a common goal without any prior agreement regarding how to do so, has emerged as a recent area of study in the AI literature. However, to date, no attention has been dedicated to the moral aspect of the agents' behavior. In this research, we introduce the M-TAMER framework (a novel variant of TAMER) used to teach agents the idea of human morality. Using a hybrid team (agents and people), if taking an action considered to be morally bad, the agents will receive negative feedback from the human teammate(s). Using M-TAMER, agents will be able to develop an "inner-conscience'' which will enable them to act consistently with human morality.

 Project title Factored Cognition: Amplifying Human Cognition for Safely Scalable AGI 

 Amount recommended $225,000.00 Primary investigator Owain Evans , Oxford University Details Technical Abstract

 Our goal is to understand how Machine Learning can be used for AGI in a way that is 'safely scalable', i.e. becomes increasingly aligned with human interests as the ML components improve. Existing approaches to AGI (including RL and IRL) are arguably not safely scalable: the agent can become un-aligned once its cognitive resources exceed those of the human overseer. Christiano's Iterated Distillation and Amplification (IDA) is a promising alternative. In IDA, the human and agent are 'amplified' into a resourceful (but slow) overseer by allowing the human to make calls to the previous iteration of the agent. By construction, this overseer is intended to always stay ahead of the agent being overseen.
 Could IDA produce highly capable aligned agents given sufficiently advanced ML components? While we cannot directly get empirical evidence today, we can study it indirectly by running amplification with humans as stand-ins for AI. This corresponds to the study of 'factored cognition', the question of whether sophisticated reasoning can be broken down into many small and mostly independent sub-tasks. We will explore schemes for factored cognition empirically and exploit automation via ML to tackle

... (truncated, 15 KB total)

Resource ID: ac93a67c8aba6492