swe-bench/swe-bench-verified
Coding
SWE-bench Verified: human-filtered subset of SWE-bench (collaboration with OpenAI) where human SWEs confirmed each real GitHub issue is solvable given the available repository context.
Run this task
CLI:
inspect eval inspect_harbor/swe_bench_verified --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import swe_bench_verified
eval(swe_bench_verified(), model="openai/gpt-5")Dataset information
| Harbor registry | swe-bench/swe-bench-verified |
| Inspect task | swe_bench_verified |
| Latest digest | sha256:b934b0cc3dc800fe945eaf9f1623329db97ee3133c706d20644524c7759fb341 |
| Samples | 500 |
| Paper | arxiv |
| Source | https://github.com/SWE-bench/SWE-bench |
See Task Parameters for the parameter set shared across all Harbor tasks.