Predictability and Surprise in Large Generative Models

web

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This Anthropic paper examines the paradox of large generative models having predictable scaling laws but unpredictable emergent capabilities, with direct implications for AI safety governance and deployment risk assessment.

Metadata

Importance: 72/100conference paperanalysis

Summary

The paper identifies a core tension in large generative models: while their training loss follows predictable scaling laws, their specific capabilities, behaviors, and outputs remain difficult to anticipate. This unpredictability creates challenges for safe deployment and policy regulation. The authors provide empirical examples of harmful emergent behaviors and propose interventions for the AI community.

Key Points

•Large generative models exhibit predictable aggregate loss (scaling laws) but unpredictable specific capabilities and outputs, creating a fundamental safety challenge.
•The appearance of useful, predictable capabilities drives rapid development, while unpredictable qualities make it hard to anticipate deployment consequences.
•The paper documents real-world and experimental examples of socially harmful behaviors arising from this unpredictability.
•Authors analyze how conflicting predictable/unpredictable properties create both incentives for deployment and barriers to safe release.
•The paper concludes with policy interventions for policymakers, funders, technologists, and academics to improve beneficial outcomes.

Cached Content Preview

HTTP 200Fetched Apr 10, 202698 KB

Predictability and Surprise in Large Generative Models

 
 
 Deep Ganguli
 
 deep@anthropic.com 
 
 ,  
 Danny Hernandez
 
 danny@anthropic.com 
 
 ,  
 Liane Lovitt
 
 liane@anthropic.com 
 
 ,  
 Nova DasSarma
 
 nova@anthropic.com 
 
 ,  
 Tom Henighan
 
 henighan@anthropic.com 
 
 ,  
 Andy Jones
 
 andy@anthropic.com 
 
 ,  
 Nicholas Joseph
 
 nick@anthropic.com 
 
 ,  
 Jackson Kernion
 
 jackson@anthropic.com 
 
 ,  
 Ben Mann
 
 ben@anthropic.com 
 
 ,  
 Amanda Askell
 
 amanda@anthropic.com 
 
 ,  
 Yuntao Bai
 
 yuntao@anthropic.com 
 
 ,  
 Anna Chen
 
 anna@anthropic.com 
 
 ,  
 Tom Conerly
 
 tom.conerly@anthropic.com 
 
 ,  
 Dawn Drain
 
 dawn@anthropic.com 
 
 ,  
 Nelson Elhage
 
 nelhage@anthropic.com 
 
 ,  
 Sheer El Showk
 
 sheer@anthropic.com 
 
 ,  
 Stanislav Fort
 
 stan@anthropic.com 
 
 ,  
 Zac Hatfield-Dodds
 
 zac@anthropic.com 
 
 ,  
 Scott Johnston
 
 scott@anthropic.com 
 
 ,  
 Shauna Kravec
 
 shauna@anthropic.com 
 
 ,  
 Neel Nanda
 
 neel@anthropic.com 
 
 ,  
 Kamal Ndousse
 
 kamal@anthropic.com 
 
 ,  
 Catherine Olsson
 
 catherio@anthropic.com 
 
 ,  
 Daniela Amodei
 
 daniela@anthropic.com 
 
 ,  
 Tom Brown
 
 tom@anthropic.com 
 
 ,  
 Jared Kaplan
 
 jared@anthropic.com 
 
 ,  
 Sam McCandlish
 
 sam@anthropic.com 
 
 ,  
 Chris Olah
 
 colah@anthropic.com 
 
 ,  
 Dario Amodei
 
 dario@anthropic.com 
 
  and  
 Jack Clark
 
 jack@anthropic.com 
 
 Anthropic San Francisco USA 
 
 
 (2022) 

 
 Abstract.

 Large-scale pre-training has recently emerged as a technique for creating capable, general-purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have a paradoxical combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs, and outputs. We believe that the high-level predictability and appearance of useful capabilities drives rapid development of such models, while the unpredictable qualities make it difficult to anticipate the consequences of model deployment. We go through examples of how this combination can lead to socially harmful behavior with examples from the literature and real world observations, and we also perform two novel experiments to illustrate our point about harms from unpredictability. Furthermore, we analyze how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment. We conclude with a list of possible interventions the AI community may take to increase the chance of these models having a beneficial impact. We intend this paper to be useful to policymakers who want to understand and regulate AI systems, technologists who care about the potential policy impact of their work, funders who want to suppo

... (truncated, 98 KB total)

Resource ID: 370eb18a7329ea8b | Stable ID: sid_i4zmxJH2K5