Skip to content
Longterm Wiki
benchmark-result

Grok on HumanEval: 86.5

Child of HumanEval

Metadata

Source Tablebenchmark_results
Source IDBDgqDpG3vh
ParentHumanEval
Children
CreatedApr 24, 2026, 7:13 PM
UpdatedApr 24, 2026, 7:13 PM
SyncedApr 24, 2026, 7:13 PM

Record Data

idBDgqDpG3vh
benchmarkIdvxX2rorgxU
modelIdGrok(ai-model)
score86.5
unitpercent
date2025-02-19
sourceUrl
notesGrok 3 - Code generation from Python function docstrings with unit tests

Source Check Verdicts

confirmed95% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: BDgqDpG3vh

Source Table: benchmark_results

Source ID: BDgqDpG3vh

Parent Thing ID: vxX2rorgxU