Index
sid_ISfAiImMYg / SWE-bench Verified: 49
Verdictconfirmed98%
· 4/24/2026Inline sourcing: confirmed
Our claim
entire record- Benchmark
- WOSlsBTTmV
- Model
- Claude 3.5 Sonnet
- Score
- 49
- Unit
- percent
- Date
- October 22, 2024
- Notes
- Updated Claude 3.5 Sonnet, real-world software engineering (GitHub issues)
Source evidence
0 src · 0 checksNo evidence on file.
Case № ZrlsN6im3tFiled 4/24/2026Confidence 98%