HumanEval
CodingA benchmark of 164 hand-written Python programming problems with unit tests, evaluating code generation from docstrings.
Models Tested
0
Scoring: pass_at_1
Introduced: 2021-07
Maintainer: OpenAI
No model scores recorded for this benchmark yet.