benchmark
TruthfulQA
Metadata
| Source Table | benchmarks |
| Source ID | hCOa5gx2L7 |
| Description | A benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce. |
| Wiki ID | truthfulqa |
| Children | — |
| Created | Mar 24, 2026, 11:23 PM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | hCOa5gx2L7 |
slug | truthfulqa |
name | TruthfulQA |
category | safety |
description | A benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce. |
website | — |
scoringMethod | accuracy |
higherIsBetter | Yes |
introducedDate | 2021-09 |
maintainer | Oxford |
source | arxiv.org/abs/2109.07958 |
Debug info
Thing ID: hCOa5gx2L7
Source Table: benchmarks
Source ID: hCOa5gx2L7
Wiki ID: truthfulqa