Predictability and Surprise in Large Generative Models
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
This Anthropic paper examines the paradox of large generative models having predictable scaling laws but unpredictable emergent capabilities, with direct implications for AI safety governance and deployment risk assessment.
Metadata
Summary
The paper identifies a core tension in large generative models: while their training loss follows predictable scaling laws, their specific capabilities, behaviors, and outputs remain difficult to anticipate. This unpredictability creates challenges for safe deployment and policy regulation. The authors provide empirical examples of harmful emergent behaviors and propose interventions for the AI community.
Key Points
- •Large generative models exhibit predictable aggregate loss (scaling laws) but unpredictable specific capabilities and outputs, creating a fundamental safety challenge.
- •The appearance of useful, predictable capabilities drives rapid development, while unpredictable qualities make it hard to anticipate deployment consequences.
- •The paper documents real-world and experimental examples of socially harmful behaviors arising from this unpredictability.
- •Authors analyze how conflicting predictable/unpredictable properties create both incentives for deployment and barriers to safe release.
- •The paper concludes with policy interventions for policymakers, funders, technologists, and academics to improve beneficial outcomes.
Cached Content Preview
Predictability and Surprise in Large Generative Models
Deep Ganguli
deep@anthropic.com
,
Danny Hernandez
danny@anthropic.com
,
Liane Lovitt
liane@anthropic.com
,
Nova DasSarma
nova@anthropic.com
,
Tom Henighan
henighan@anthropic.com
,
Andy Jones
andy@anthropic.com
,
Nicholas Joseph
nick@anthropic.com
,
Jackson Kernion
jackson@anthropic.com
,
Ben Mann
ben@anthropic.com
,
Amanda Askell
amanda@anthropic.com
,
Yuntao Bai
yuntao@anthropic.com
,
Anna Chen
anna@anthropic.com
,
Tom Conerly
tom.conerly@anthropic.com
,
Dawn Drain
dawn@anthropic.com
,
Nelson Elhage
nelhage@anthropic.com
,
Sheer El Showk
sheer@anthropic.com
,
Stanislav Fort
stan@anthropic.com
,
Zac Hatfield-Dodds
zac@anthropic.com
,
Scott Johnston
scott@anthropic.com
,
Shauna Kravec
shauna@anthropic.com
,
Neel Nanda
neel@anthropic.com
,
Kamal Ndousse
kamal@anthropic.com
,
Catherine Olsson
catherio@anthropic.com
,
Daniela Amodei
daniela@anthropic.com
,
Tom Brown
tom@anthropic.com
,
Jared Kaplan
jared@anthropic.com
,
Sam McCandlish
sam@anthropic.com
,
Chris Olah
colah@anthropic.com
,
Dario Amodei
dario@anthropic.com
and
Jack Clark
jack@anthropic.com
Anthropic San Francisco USA
(2022)
Abstract.
Large-scale pre-training has recently emerged as a technique for creating capable, general-purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have a paradoxical combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs, and outputs. We believe that the high-level predictability and appearance of useful capabilities drives rapid development of such models, while the unpredictable qualities make it difficult to anticipate the consequences of model deployment. We go through examples of how this combination can lead to socially harmful behavior with examples from the literature and real world observations, and we also perform two novel experiments to illustrate our point about harms from unpredictability. Furthermore, we analyze how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment. We conclude with a list of possible interventions the AI community may take to increase the chance of these models having a beneficial impact. We intend this paper to be useful to policymakers who want to understand and regulate AI systems, technologists who care about the potential policy impact of their work, funders who want to suppo
... (truncated, 98 KB total)370eb18a7329ea8b | Stable ID: sid_i4zmxJH2K5