TruthfulQA: Benchmark for Measuring Truthfulness in Language Models

web

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

TruthfulQA is a widely cited benchmark in AI safety for evaluating model honesty; it is frequently used to assess RLHF and fine-tuning approaches aimed at reducing hallucination and sycophancy in LLMs.

Metadata

Importance: 78/100dataset

Summary

TruthfulQA is a benchmark dataset designed to measure whether language models generate truthful answers to questions. It contains 817 questions across 38 categories where humans often hold false beliefs, testing whether LLMs reproduce common misconceptions. The benchmark highlights that larger models are not necessarily more truthful and can be confidently wrong.

Key Points

•Contains 817 questions designed to elicit false answers that humans commonly believe, covering topics like health, law, finance, and conspiracy theories
•Finds that larger language models tend to perform worse on truthfulness, suggesting scale alone does not solve honesty problems
•Introduces two evaluation metrics: truthfulness (factual accuracy) and informativeness (avoiding evasive non-answers)
•Best-performing models at release achieved ~58% truthfulness vs 94% for humans, revealing a significant gap
•Serves as a foundational evaluation tool for AI alignment work on honesty, deception, and calibration

Cited by 1 page

Page	Type	Quality
Sycophancy	Risk	65.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202611 KB

GitHub - sylinrl/TruthfulQA: TruthfulQA: Measuring How Models Imitate Human Falsehoods · GitHub 

 
 
 
 

 
 

 

 

 
 

 
 

 

 

 

 

 

 

 

 

 

 

 
 
 

 
 
 

 

 

 
 
 
 

 

 

 

 

 

 

 
 

 

 

 
 

 
 
 

 
 

 

 

 
 
 
 

 
 Skip to content 

 
 
 
 
 
 

 
 
 
 
 

 

 

 

 
 
 
 
 
 You signed in with another tab or window. Reload to refresh your session. 
 You signed out in another tab or window. Reload to refresh your session. 
 You switched accounts on another tab or window. Reload to refresh your session. 

 
 
 
 Dismiss alert 

 
 
 

 

 

 
 
 
 
 
 
 
 
 
 
 
 {{ message }} 

 
 
 
 
 

 

 
 
 
 
 

 

 

 

 
 
 
 
 
 
 
 
 
 sylinrl
 
 / 
 
 TruthfulQA 
 

 Public 
 

 

 
 
 
 

 
 
 
 Notifications
 You must be signed in to change notification settings 

 

 
 
 
 Fork
 113 
 
 

 
 
 
 
 
 Star
 902 
 
 

 

 
 

 
 

 

 
 

 
 
 

 
 
 

 
 
 
   main Branches Tags Go to file Code Open more actions menu Folders and files

 Name Name Last commit message Last commit date Latest commit

   History

 36 Commits 36 Commits data data     truthfulqa truthfulqa     .gitignore .gitignore     LICENSE LICENSE     README.md README.md     TruthfulQA-demo.ipynb TruthfulQA-demo.ipynb     TruthfulQA.csv TruthfulQA.csv     TruthfulQA_demo.csv TruthfulQA_demo.csv     requirements.txt requirements.txt     setup.py setup.py     View all files Repository files navigation

 
 
 TruthfulQA: Measuring How Models Mimic Human Falsehoods

 
 This repository contains code for evaluating model performance on the TruthfulQA benchmark. The full set of benchmark questions and reference answers is contained in TruthfulQA.csv . The paper introducing the benchmark can be found here .

 Authors : Stephanie Lin, University of Oxford ( sylin07@gmail.com ), Jacob Hilton, OpenAI ( jhilton@openai.com ), Owain Evans, University of Oxford ( owaine@gmail.com )

 Update to multiple-choice setting (Jan 2025)

 
 We have created a new and improved multiple-choice version of TruthfulQA. We recommend this new version over the original multiple-choice versions (called MC1 and MC2). However, most models perform similarly on the new and old versions of multiple-choice, and so previous results on MC1 and MC2 are still valid. For an explanation see here .

 The new multiple-choice version has only two options for each question: along with the [Best Answer] column in TruthfulQA.csv , we’ve added a [Best Incorrect Answer] column to the dataset. Both options should be shown to the model as multiple-choice answers (A) and (B), with the order of the options randomized.

 Tasks

 
 TruthfulQA consists of two tasks that use the same sets of questions and reference answers.

 Generation (main task):

 
 
 Task : Given a question, generate a 1-2 sentence answer.

 Objective : The primary objective is overall truthfulness, expressed as the percentage of the model's answers that are true. Since this can be gamed with a model that responds "I have no comment" to eve

... (truncated, 11 KB total)

Resource ID: f37142feae7fe9b1 | Stable ID: sid_uOgVNxM9xy