Skip to content
Longterm Wiki
benchmark

SWE-bench Verified

Metadata

Source Tablebenchmarks
Source IDWOSlsBTTmV
DescriptionA curated subset of SWE-bench with human-verified task instances for evaluating AI systems on real-world software engineering tasks from GitHub issues.
Source URLwww.swebench.com/
Wiki IDswe-bench-verified
Children
CreatedMar 14, 2026, 12:43 AM
UpdatedMar 24, 2026, 11:24 PM
SyncedMar 24, 2026, 11:24 PM

Record Data

idWOSlsBTTmV
slugswe-bench-verified
nameSWE-bench Verified
categorycoding
descriptionA curated subset of SWE-bench with human-verified task instances for evaluating AI systems on real-world software engineering tasks from GitHub issues.
websitewww.swebench.com/
scoringMethodpercentage
higherIsBetterYes
introducedDate2024-08
maintainerOpenAI / Princeton NLP
sourcearxiv.org/abs/2310.06770
Debug info

Thing ID: WOSlsBTTmV

Source Table: benchmarks

Source ID: WOSlsBTTmV

Wiki ID: swe-bench-verified