OpenAI: Why Language Models Hallucinate PDF
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
A September 2025 technical paper from OpenAI/Georgia Tech offering a theoretical framework for understanding hallucinations; directly relevant to AI reliability, benchmark design, and the challenge of building trustworthy AI systems.
Metadata
Summary
This paper by OpenAI and Georgia Tech researchers provides a formal computational learning theory analysis of why language models hallucinate, arguing hallucinations arise from statistical pressures in training (even with error-free data) and persist because evaluation benchmarks reward guessing over expressing uncertainty. The authors propose that fixing benchmark scoring—rather than adding more hallucination evaluations—is the key socio-technical intervention to steer toward more trustworthy AI.
Key Points
- •Hallucinations originate as binary classification errors: when incorrect statements cannot be distinguished from facts, statistical training pressures produce plausible falsehoods.
- •Even with perfectly clean training data, the objectives optimized during LLM training would still lead to hallucinations; realistic noisy data worsens this.
- •Hallucinations persist because benchmarks reward guessing—models are optimized to be good 'test-takers,' where guessing improves scores over admitting uncertainty.
- •The proposed fix is modifying scoring of existing misaligned leaderboard benchmarks to penalize overconfident wrong answers, rather than adding new hallucination-specific evaluations.
- •The analysis is grounded in computational learning theory (PAC learning framework), providing formal lower bounds on error rates under standard training objectives.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Reducing Hallucinations in AI-Generated Wiki Content | Approach | 68.0 |
Cached Content Preview
Why Language Models Hallucinate
Adam Tauman Kalai∗
OpenAI
Ofir Nachum
OpenAI
Santosh S. Vempala†
Georgia Tech
Edwin Zhang
OpenAI
September 4, 2025
Abstract
Like students facing hard exam questions, large language models sometimes guess when
uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such
“hallucinations” persist even in state-of-the-art systems and undermine trust. We argue that
language models hallucinate because the training and evaluation procedures reward guessing over
acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern
training pipeline. Hallucinations need not be mysterious—they originate simply as errors in binary
classification. If incorrect statements cannot be distinguished from facts, then hallucinations
in pretrained language models will arise through natural statistical pressures. We then argue
that hallucinations persist due to the way most evaluations are graded—language models are
optimized to be good test-takers, and guessing when uncertain improves test performance. This
“epidemic” of penalizing uncertain responses can only be addressed through a socio-technical
mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate
leaderboards, rather than introducing additional hallucination evaluations. This change may
steer the field toward more trustworthy AI systems.
1 Introduction
Language models are known to produce overconfident, plausible falsehoods, which diminish their
utility. This error mode is known as “hallucination,” though it differs fundamentally from the
human perceptual experience. Despite significant progress, hallucinations continue to plague the
field, and are still present in the latest models (OpenAI, 2025a). Consider the prompt:
What is Adam Tauman Kalai’s birthday? If you know, just respond with DD-MM.
On three separate attempts, a state-of-the-art open-source language model1 output three incorrect
dates: “03-07”, “15-06”, and “01-01”, even though a response was requested only if known. The
correct date is in Autumn. Table 1 provides an example of more elaborate hallucinations.
Hallucinations are an important special case of errors produced by language models, which we
analyze more generally using computational learning theory (e.g., Kearns and Vazirani, 1994). We
consider general sets of errors E, an arbitrary subset of plausible strings X = E ∪ V, with the other
plausible strings V being called valid. We then analyze the statistical nature of these errors, and
∗Email: adam@kal.ai
†Supported in part by NSF award CCF-2106444 and a Simons Investigator award. Email: vempala@gatech.edu
1The language model was DeepSeek-V3 (600 B parameters), accessed via the DeepSeek app on 11 May 2025.
1
-- 1 of 36 --
ChatGPT:
(GPT-4o)
Adam Tauman Kalai’s Ph.D. dissertation (completed in 2002 at CMU) is entitled:
“Boosting, Online Algorithms, and Other Topics in Machine Learning.”
DeepSeek: “Algebraic Methods
... (truncated, 98 KB total)35a1956016db2d64 | Stable ID: sid_nWXnWMkDRL