Back
LLM-Honesty-Survey (2025-TMLR)
webCredibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
Data Status
Full text fetchedFetched Dec 28, 2025
Summary
A systematic review of honesty in Large Language Models, analyzing their ability to recognize known/unknown information and express knowledge faithfully. The survey provides a structured framework for evaluating and improving LLM trustworthiness.
Key Points
- •Honesty in LLMs defined by self-knowledge and self-expression capabilities
- •Multiple evaluation approaches exist for assessing LLM truthfulness and uncertainty
- •Both training-free and training-based methods can improve LLM honesty
Review
This survey provides a comprehensive examination of honesty in Large Language Models (LLMs), defining honesty through two critical dimensions: self-knowledge and self-expression. Self-knowledge refers to a model's ability to recognize its own capabilities, acknowledge limitations, and express uncertainty, while self-expression focuses on faithfully communicating its acquired knowledge without fabrication.
The research synthesizes multiple approaches for evaluating and improving LLM honesty, including training-free methods like predictive probability analysis and prompting techniques, and training-based approaches such as supervised fine-tuning and reinforcement learning. By cataloging existing research and methodologies, the survey offers crucial insights into developing more reliable and transparent AI systems, highlighting the importance of addressing hallucinations, calibrating confidence, and creating mechanisms that enable models to recognize and communicate the boundaries of their knowledge.
Resource ID:
68e2c715e3d92283 | Stable ID: NjMxZWM1Mj