Theory of Mind May Have Spontaneously Emerged in Large Language Models (Kosinski, 2023)
webA widely cited but disputed 2023 paper claiming emergent theory of mind in GPT-4; important for discussions of unpredictable capability emergence and the difficulty of evaluating whether AI systems model human mental states, with direct implications for deception and manipulation risks.
Metadata
Summary
Michal Kosinski's influential and controversial study argues that large language models, particularly GPT-4, spontaneously developed theory of mind (ToM) capabilities—the ability to attribute mental states to others—as an emergent property of scale. The paper presents benchmark results suggesting GPT-4 performs at or near human adult levels on classic false-belief tasks. This sparked significant debate about whether LLMs genuinely reason about mental states or exploit statistical patterns.
Key Points
- •GPT-4 reportedly solved 95% of ToM tasks, comparable to 9-year-old human performance, despite not being explicitly trained for this capability.
- •Theory of mind appears to have emerged spontaneously as model scale increased, suggesting capabilities can arise unpredictably from scaling alone.
- •The findings are highly contested—critics argue LLMs may exploit training data contamination or surface-level patterns rather than genuine mental state reasoning.
- •Raises safety-relevant questions about whether advanced AI systems could model human intentions, beliefs, and deception without explicit design.
- •Contributes to broader debates on emergent capabilities, benchmark validity, and how to evaluate whether LLMs truly understand versus pattern-match.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Emergent Capabilities | Risk | 61.0 |
Cached Content Preview
Theory of Mind May Have Spontaneously Emerged in Large Language Models | Stanford Graduate School of Business Faculty & Research Menu Faculty Publications Books Working Papers Case Studies Postdoctoral Scholars Research Labs & Initiatives Behavioral Lab Data, Analytics & Research Computing Faculty Publications Books Working Papers Case Studies Research Labs & Initiatives Behavioral Lab DARC Theory of Mind May Have Spontaneously Emerged in Large Language Models By Michal Kosinski March 2023 Organizational Behavior View Publication Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We tested several language models using 40 classic false-belief tasks widely used to test ToM in humans. The models published before 2020 showed virtually no ability to solve ToM tasks. Yet, the first version of GPT-3 (“davinci-001”), published in May 2020, solved about 40% of false-belief tasks — performance comparable with 3.5-year-old children. Its second version (“davinci-002”; January 2022) solved 70% of false-belief tasks, performance comparable with six-year-olds. Its most recent version, GPT-3.5 (“davinci-003”; November 2022), solved 90% of false-belief tasks, at the level of seven-year-olds. GPT-4 published in March 2023 solved nearly all the tasks (95%). These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models’ improving language skills. Related Related Michal Kosinski Associate Professor, Organizational Behavior @michalkosinski
d5b875308e858c3f | Stable ID: sid_BhaXG6Es2Y