swe-bench/swe-bench-verified

Coding

SWE-bench Verified: human-filtered subset of SWE-bench (collaboration with OpenAI) where human SWEs confirmed each real GitHub issue is solvable given the available repository context.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/swe_bench_verified --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import swe_bench_verified

eval(swe_bench_verified(), model="openai/gpt-5")

Dataset information

Harbor registry swe-bench/swe-bench-verified
Inspect task swe_bench_verified
Latest digest sha256:b934b0cc3dc800fe945eaf9f1623329db97ee3133c706d20644524c7759fb341
Samples 500
Paper arxiv
Source https://github.com/SWE-bench/SWE-bench

See Task Parameters for the parameter set shared across all Harbor tasks.