Theory of Mind May Have Spontaneously Emerged in Large Language Models (Kosinski, 2023)

web

gsb.stanford.edu·gsb.stanford.edu/faculty-research/working-papers/theory-m...

A widely cited but disputed 2023 paper claiming emergent theory of mind in GPT-4; important for discussions of unpredictable capability emergence and the difficulty of evaluating whether AI systems model human mental states, with direct implications for deception and manipulation risks.

Metadata

Importance: 62/100working paperprimary source

Summary

Michal Kosinski's influential and controversial study argues that large language models, particularly GPT-4, spontaneously developed theory of mind (ToM) capabilities—the ability to attribute mental states to others—as an emergent property of scale. The paper presents benchmark results suggesting GPT-4 performs at or near human adult levels on classic false-belief tasks. This sparked significant debate about whether LLMs genuinely reason about mental states or exploit statistical patterns.

Key Points

•GPT-4 reportedly solved 95% of ToM tasks, comparable to 9-year-old human performance, despite not being explicitly trained for this capability.
•Theory of mind appears to have emerged spontaneously as model scale increased, suggesting capabilities can arise unpredictably from scaling alone.
•The findings are highly contested—critics argue LLMs may exploit training data contamination or surface-level patterns rather than genuine mental state reasoning.
•Raises safety-relevant questions about whether advanced AI systems could model human intentions, beliefs, and deception without explicit design.
•Contributes to broader debates on emergent capabilities, benchmark validity, and how to evaluate whether LLMs truly understand versus pattern-match.

Cited by 1 page

Page	Type	Quality
Emergent Capabilities	Risk	61.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20263 KB

Theory of Mind May Have Spontaneously Emerged in Large Language Models | Stanford Graduate School of Business 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 

 

 
 
 
 
 
 
 
 
 
 
 Faculty & Research 
 
 
 
 

 

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 Menu 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 

 
 
 

 Faculty 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Publications 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Books 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Working Papers 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Case Studies 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Postdoctoral Scholars 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Research Labs & Initiatives 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Behavioral Lab 
 
 

 
 
 
 
 
 
 
 

 
 
 

 Data, Analytics & Research Computing 
 
 

 
 

 

 

 
 
 
 Faculty
 
 

 
 
 Publications
 
 

 
 
 Books
 
 

 
 
 Working Papers
 
 

 
 
 Case Studies
 
 

 
 
 Research Labs & Initiatives
 
 

 
 
 Behavioral Lab
 
 

 
 
 DARC
 
 

 

 
 
 
 

 

 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 

 

 

 
 
 
 
 
 
 

 

 
 

 
 Theory of Mind May Have Spontaneously Emerged in Large Language Models

 

 
 
 
 

 By Michal Kosinski 
 
 
 
 
 
 March 2023 
 

 
 Organizational Behavior 
 
 
 

 View Publication 
 
 
 

 
 
 
 
 Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We tested several language models using 40 classic false-belief tasks widely used to test ToM in humans. The models published before 2020 showed virtually no ability to solve ToM tasks. Yet, the first version of GPT-3 (“davinci-001”), published in May 2020, solved about 40% of false-belief tasks — performance comparable with 3.5-year-old children. Its second version (“davinci-002”; January 2022) solved 70% of false-belief tasks, performance comparable with six-year-olds. Its most recent version, GPT-3.5 (“davinci-003”; November 2022), solved 90% of false-belief tasks, at the level of seven-year-olds. GPT-4 published in March 2023 solved nearly all the tasks (95%). These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models’ improving language skills.

 
 
 
 
 
 
 

 

 
 
 
 Related

 Related 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 Michal Kosinski
 
 

 
 Associate Professor, Organizational Behavior
 
 
 
 

 @michalkosinski

Resource ID: d5b875308e858c3f | Stable ID: sid_BhaXG6Es2Y