Benchmark Result·IYNeYuPaf3·Record·Profile

sid_dHgSM46fMw / SWE-bench Verified: 74.5

Verdictconfirmed99%

· 4/24/2026

Inline sourcing: confirmed

Our claim

entire record

Benchmark: WOSlsBTTmV
Model: Claude Opus 4.1
Score: 74.5
Unit: percent
Date: August 5, 2025
Notes: Real-world coding benchmark evaluating model's ability to resolve GitHub issues. Reported without extended thinking.
Tested By: unknown

0 src · 0 checks

No evidence on file.

Case № IYNeYuPaf3Filed 4/24/2026Confidence 99%