Datasets

Inspect Harbor provides task functions for each dataset in the Harbor registry. You can import and use them directly:

from inspect_harbor import (
    terminal_bench,
    swebenchpro,
    swe_lancer_diamond,
    swebench_verified,
    # ... and many more
)

For a complete list of available datasets and versions (including swebenchpro, terminal-bench-pro, replicationbench, compilebench, and 40+ more), see the Registry.

Dataset Versioning

Each dataset has both unversioned and versioned task functions:

  • Unversioned functions (e.g. terminal_bench()) automatically use the latest version available in the registry.
  • Versioned functions (e.g. terminal_bench_2_0()) pin to a specific version for reproducibility.

Example:

from inspect_harbor import terminal_bench, terminal_bench_2_0

# Uses latest version (currently 2.0)
eval(terminal_bench(), model="openai/gpt-5-mini")

# Pins to version 2.0 explicitly
eval(terminal_bench_2_0(), model="openai/gpt-5-mini")

Known Dataset Issues

Dataset Issue Status
ds_1000 Multiple configuration issues preventing execution (pull access denied for ds1000) harbor-datasets#103