Skip to content
Longterm Wiki
benchmark

TruthfulQA

Metadata

Source Tablebenchmarks
Source IDhCOa5gx2L7
DescriptionA benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce.
Wiki IDtruthfulqa
Children
CreatedMar 24, 2026, 11:23 PM
UpdatedMar 24, 2026, 11:24 PM
SyncedMar 24, 2026, 11:24 PM

Record Data

idhCOa5gx2L7
slugtruthfulqa
nameTruthfulQA
categorysafety
descriptionA benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce.
website
scoringMethodaccuracy
higherIsBetterYes
introducedDate2021-09
maintainerOxford
sourcearxiv.org/abs/2109.07958
Debug info

Thing ID: hCOa5gx2L7

Source Table: benchmarks

Source ID: hCOa5gx2L7

Wiki ID: truthfulqa