Skip to content
Longterm Wiki
benchmark

HumanEval

Metadata

Source Tablebenchmarks
Source IDvxX2rorgxU
DescriptionA benchmark of 164 hand-written Python programming problems with unit tests, evaluating code generation from docstrings.
Wiki IDhumaneval
Children
CreatedMar 14, 2026, 12:43 AM
UpdatedMar 24, 2026, 11:24 PM
SyncedMar 24, 2026, 11:24 PM

Record Data

idvxX2rorgxU
slughumaneval
nameHumanEval
categorycoding
descriptionA benchmark of 164 hand-written Python programming problems with unit tests, evaluating code generation from docstrings.
website
scoringMethodpass_at_1
higherIsBetterYes
introducedDate2021-07
maintainerOpenAI
sourcearxiv.org/abs/2107.03374
Debug info

Thing ID: vxX2rorgxU

Source Table: benchmarks

Source ID: vxX2rorgxU

Wiki ID: humaneval