datacurve/deep-swe
Coding
DeepSWE: Measuring frontier coding agents on original, long-horizon engineering tasks.
Run this task
CLI:
inspect eval inspect_harbor/datacurve_deep_swe --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import datacurve_deep_swe
eval(datacurve_deep_swe(), model="openai/gpt-5")Dataset information
| Harbor registry | datacurve/deep-swe |
| Inspect task | datacurve_deep_swe |
| Latest digest | sha256:aaa82ceb8404dccc17689c9383f93dbcbc8f029a7601d2e3856a416f2cb89269 |
| Samples | 113 |
| Source | https://github.com/datacurve-ai/deep-swe |
See Task Parameters for the parameter set shared across all Harbor tasks.