Skip to content
Longterm Wiki
benchmark-result

Claude 3.5 Sonnet on SWE-bench Verified: 49

Child of SWE-bench Verified

Metadata

Source Tablebenchmark_results
Source IDZrlsN6im3t
ParentSWE-bench Verified
Children
CreatedApr 24, 2026, 7:31 PM
UpdatedApr 24, 2026, 7:31 PM
SyncedApr 24, 2026, 7:31 PM

Record Data

idZrlsN6im3t
benchmarkIdWOSlsBTTmV
modelIdClaude 3.5 Sonnet(ai-model)
score49
unitpercent
date2024-10-22
sourceUrl
notesUpdated Claude 3.5 Sonnet, real-world software engineering (GitHub issues)

Source Check Verdicts

confirmed98% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: ZrlsN6im3t

Source Table: benchmark_results

Source ID: ZrlsN6im3t

Parent Thing ID: WOSlsBTTmV