benchmark

TruthfulQA

Metadata

Source Table	`benchmarks`
Source ID	`hCOa5gx2L7`
Description	A benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce.
Wiki ID	truthfulqa
Children	—
Created	Mar 24, 2026, 11:23 PM
Updated	Mar 24, 2026, 11:24 PM
Synced	Mar 24, 2026, 11:24 PM

Record Data

`id`	hCOa5gx2L7
`slug`	truthfulqa
`name`	TruthfulQA
`category`	safety
`description`	A benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce.
`website`	—
`scoringMethod`	accuracy
`higherIsBetter`	Yes
`introducedDate`	2021-09
`maintainer`	Oxford
`source`	arxiv.org/abs/2109.07958

Debug info

Thing ID: hCOa5gx2L7

Source Table: benchmarks

Source ID: hCOa5gx2L7

Wiki ID: truthfulqa