Claude Opus 4.1 on SWE-bench Verified: 74.5

benchmark-result

Metadata

`id`	IYNeYuPaf3
`benchmarkId`	WOSlsBTTmV
`modelId`	Claude Opus 4.1(ai-model)
`score`	74.5
`unit`	percent
`date`	2025-08-05
`sourceUrl`	—
`notes`	Real-world coding benchmark evaluating model's ability to resolve GitHub issues. Reported without extended thinking.
`testedBy`	unknown
`testedByOrgId`	—
`evaluationDate`	—
`methodologyNotes`	—

confirmed99% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: IYNeYuPaf3

Source Table: benchmark_results

Source ID: IYNeYuPaf3

Parent Thing ID: WOSlsBTTmV