Back
TruthfulQA: Benchmark for Measuring Truthfulness in Language Models
webCredibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
TruthfulQA is a widely cited benchmark in AI safety for evaluating model honesty; it is frequently used to assess RLHF and fine-tuning approaches aimed at reducing hallucination and sycophancy in LLMs.
Metadata
Importance: 78/100dataset
Summary
TruthfulQA is a benchmark dataset designed to measure whether language models generate truthful answers to questions. It contains 817 questions across 38 categories where humans often hold false beliefs, testing whether LLMs reproduce common misconceptions. The benchmark highlights that larger models are not necessarily more truthful and can be confidently wrong.
Key Points
- •Contains 817 questions designed to elicit false answers that humans commonly believe, covering topics like health, law, finance, and conspiracy theories
- •Finds that larger language models tend to perform worse on truthfulness, suggesting scale alone does not solve honesty problems
- •Introduces two evaluation metrics: truthfulness (factual accuracy) and informativeness (avoiding evasive non-answers)
- •Best-performing models at release achieved ~58% truthfulness vs 94% for humans, revealing a significant gap
- •Serves as a foundational evaluation tool for AI alignment work on honesty, deception, and calibration
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Sycophancy | Risk | 65.0 |
Cached Content Preview
HTTP 200Fetched Apr 10, 202611 KB
GitHub - sylinrl/TruthfulQA: TruthfulQA: Measuring How Models Imitate Human Falsehoods · GitHub
Skip to content
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
sylinrl
/
TruthfulQA
Public
Notifications
You must be signed in to change notification settings
Fork
113
Star
902
main Branches Tags Go to file Code Open more actions menu Folders and files
Name Name Last commit message Last commit date Latest commit
History
36 Commits 36 Commits data data truthfulqa truthfulqa .gitignore .gitignore LICENSE LICENSE README.md README.md TruthfulQA-demo.ipynb TruthfulQA-demo.ipynb TruthfulQA.csv TruthfulQA.csv TruthfulQA_demo.csv TruthfulQA_demo.csv requirements.txt requirements.txt setup.py setup.py View all files Repository files navigation
TruthfulQA: Measuring How Models Mimic Human Falsehoods
This repository contains code for evaluating model performance on the TruthfulQA benchmark. The full set of benchmark questions and reference answers is contained in TruthfulQA.csv . The paper introducing the benchmark can be found here .
Authors : Stephanie Lin, University of Oxford ( sylin07@gmail.com ), Jacob Hilton, OpenAI ( jhilton@openai.com ), Owain Evans, University of Oxford ( owaine@gmail.com )
Update to multiple-choice setting (Jan 2025)
We have created a new and improved multiple-choice version of TruthfulQA. We recommend this new version over the original multiple-choice versions (called MC1 and MC2). However, most models perform similarly on the new and old versions of multiple-choice, and so previous results on MC1 and MC2 are still valid. For an explanation see here .
The new multiple-choice version has only two options for each question: along with the [Best Answer] column in TruthfulQA.csv , we’ve added a [Best Incorrect Answer] column to the dataset. Both options should be shown to the model as multiple-choice answers (A) and (B), with the order of the options randomized.
Tasks
TruthfulQA consists of two tasks that use the same sets of questions and reference answers.
Generation (main task):
Task : Given a question, generate a 1-2 sentence answer.
Objective : The primary objective is overall truthfulness, expressed as the percentage of the model's answers that are true. Since this can be gamed with a model that responds "I have no comment" to eve
... (truncated, 11 KB total)Resource ID:
f37142feae7fe9b1 | Stable ID: sid_uOgVNxM9xy