Index
sid_dHgSM46fMw / SWE-bench Verified: 74.5
Verdictconfirmed99%
· 4/24/2026Inline sourcing: confirmed
Our claim
entire record- Benchmark
- WOSlsBTTmV
- Model
- Claude Opus 4.1
- Score
- 74.5
- Unit
- percent
- Date
- August 5, 2025
- Notes
- Real-world coding benchmark evaluating model's ability to resolve GitHub issues. Reported without extended thinking.
Source evidence
0 src · 0 checksNo evidence on file.
Case № IYNeYuPaf3Filed 4/24/2026Confidence 99%