Benchmark Result·BDgqDpG3vh·Record·Profile

sid_nnv09Wl5OQ / HumanEval: 86.5

Verdictconfirmed95%

· 4/24/2026

Inline sourcing: confirmed

Our claim

entire record

Benchmark: vxX2rorgxU
Model: Grok
Score: 86.5
Unit: percent
Date: February 19, 2025
Notes: Grok 3 - Code generation from Python function docstrings with unit tests
Tested By: unknown

0 src · 0 checks

No evidence on file.

Case № BDgqDpG3vhFiled 4/24/2026Confidence 95%