Longterm Wiki
Back

LLM-Honesty-Survey (2025-TMLR)

web

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

Data Status

Full text fetchedFetched Dec 28, 2025

Summary

A systematic review of honesty in Large Language Models, analyzing their ability to recognize known/unknown information and express knowledge faithfully. The survey provides a structured framework for evaluating and improving LLM trustworthiness.

Key Points

  • Honesty in LLMs defined by self-knowledge and self-expression capabilities
  • Multiple evaluation approaches exist for assessing LLM truthfulness and uncertainty
  • Both training-free and training-based methods can improve LLM honesty

Review

This survey provides a comprehensive examination of honesty in Large Language Models (LLMs), defining honesty through two critical dimensions: self-knowledge and self-expression. Self-knowledge refers to a model's ability to recognize its own capabilities, acknowledge limitations, and express uncertainty, while self-expression focuses on faithfully communicating its acquired knowledge without fabrication. The research synthesizes multiple approaches for evaluating and improving LLM honesty, including training-free methods like predictive probability analysis and prompting techniques, and training-based approaches such as supervised fine-tuning and reinforcement learning. By cataloging existing research and methodologies, the survey offers crucial insights into developing more reliable and transparent AI systems, highlighting the importance of addressing hallucinations, calibrating confidence, and creating mechanisms that enable models to recognize and communicate the boundaries of their knowledge.
Resource ID: 68e2c715e3d92283 | Stable ID: NjMxZWM1Mj