cais/swebenchpro
Coding
SWE-bench Pro with anti-exploitation (git history isolation + GitHub network blocking). 731 tasks, Python/JS/TS/Go.
Run this task
CLI:
inspect eval inspect_harbor/cais_swebenchpro --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import cais_swebenchpro
eval(cais_swebenchpro(), model="openai/gpt-5")Dataset information
| Harbor registry | cais/swebenchpro |
| Inspect task | cais_swebenchpro |
| Latest digest | sha256:0684038ce8eae92d435a27307d1c5843e291152898f429af130062e8df110768 |
| Samples | 731 |
| Paper | arxiv |
| Source | https://github.com/scaleapi/SWE-bench_Pro-os |
See Task Parameters for the parameter set shared across all Harbor tasks.