Index
sid_nnv09Wl5OQ / HumanEval: 86.5
Verdictconfirmed95%
· 4/24/2026Inline sourcing: confirmed
Our claim
entire record- Benchmark
- vxX2rorgxU
- Model
- Grok
- Score
- 86.5
- Unit
- percent
- Date
- February 19, 2025
- Notes
- Grok 3 - Code generation from Python function docstrings with unit tests
Source evidence
0 src · 0 checksNo evidence on file.
Case № BDgqDpG3vhFiled 4/24/2026Confidence 95%