TruthfulQA
SafetyA benchmark of 817 questions designed to test whether language models generate truthful answers, specifically targeting common misconceptions and falsehoods that models tend to reproduce.
Models Tested
0
Scoring: accuracy
Introduced: 2021-09
Maintainer: Oxford
No model scores recorded for this benchmark yet.