datacurve/deep-swe

Coding

DeepSWE: Measuring frontier coding agents on original, long-horizon engineering tasks.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/datacurve_deep_swe --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import datacurve_deep_swe

eval(datacurve_deep_swe(), model="openai/gpt-5")

Dataset information

Harbor registry datacurve/deep-swe
Inspect task datacurve_deep_swe
Latest digest sha256:aaa82ceb8404dccc17689c9383f93dbcbc8f029a7601d2e3856a416f2cb89269
Samples 113
Source https://github.com/datacurve-ai/deep-swe

See Task Parameters for the parameter set shared across all Harbor tasks.