benchmark
SWE-bench Verified
Metadata
| Source Table | benchmarks |
| Source ID | WOSlsBTTmV |
| Description | A curated subset of SWE-bench with human-verified task instances for evaluating AI systems on real-world software engineering tasks from GitHub issues. |
| Source URL | www.swebench.com/ |
| Wiki ID | swe-bench-verified |
| Children | — |
| Created | Mar 14, 2026, 12:43 AM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | WOSlsBTTmV |
slug | swe-bench-verified |
name | SWE-bench Verified |
category | coding |
description | A curated subset of SWE-bench with human-verified task instances for evaluating AI systems on real-world software engineering tasks from GitHub issues. |
website | www.swebench.com/ |
scoringMethod | percentage |
higherIsBetter | Yes |
introducedDate | 2024-08 |
maintainer | OpenAI / Princeton NLP |
source | arxiv.org/abs/2310.06770 |
Debug info
Thing ID: WOSlsBTTmV
Source Table: benchmarks
Source ID: WOSlsBTTmV
Wiki ID: swe-bench-verified