Skip to content
Longterm Wiki
Index
Benchmark Result·BDgqDpG3vh·Record·Profile

sid_nnv09Wl5OQ / HumanEval: 86.5

Verdictconfirmed95%
· 4/24/2026

Inline sourcing: confirmed

Our claim

entire record
Benchmark
vxX2rorgxU
Model
Grok
Score
86.5
Unit
percent
Date
February 19, 2025
Notes
Grok 3 - Code generation from Python function docstrings with unit tests

Source evidence

0 src · 0 checks

No evidence on file.

Case № BDgqDpG3vhFiled 4/24/2026Confidence 95%