Skip to content
Longterm Wiki
benchmark-result

Claude Opus 4.1 on SWE-bench Verified: 74.5

Child of SWE-bench Verified

Metadata

Source Tablebenchmark_results
Source IDIYNeYuPaf3
ParentSWE-bench Verified
Children
CreatedApr 24, 2026, 6:46 PM
UpdatedApr 24, 2026, 6:46 PM
SyncedApr 24, 2026, 6:46 PM

Record Data

idIYNeYuPaf3
benchmarkIdWOSlsBTTmV
modelIdClaude Opus 4.1(ai-model)
score74.5
unitpercent
date2025-08-05
sourceUrl
notesReal-world coding benchmark evaluating model's ability to resolve GitHub issues. Reported without extended thinking.

Source Check Verdicts

confirmed99% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: IYNeYuPaf3

Source Table: benchmark_results

Source ID: IYNeYuPaf3

Parent Thing ID: WOSlsBTTmV