Inspect Harbor

Harbor is a framework for building, evaluating, and optimizing AI agents in containerized environments. Inspect Harbor provides an interface to run Harbor tasks using Inspect AI.

Installation

Install from PyPI:

pip install inspect-harbor

Or with uv:

uv add inspect-harbor

Prerequisites

Before running Harbor tasks, ensure you have:

Python 3.12 or higher – required by inspect_harbor.
Docker installed and running – required for execution when using Docker sandbox (default).
Model API keys – set appropriate environment variables (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY).

Quick Start

The fastest way to get started is to run a dataset from the Harbor registry.

CLI:

# Run Aider's Polyglot coding benchmark
inspect eval inspect_harbor/aider_polyglot --model openai/gpt-5-mini

# Run Terminal-Bench 2.0
inspect eval inspect_harbor/terminal_bench_2 --model openai/gpt-5

Python API:

from inspect_ai import eval
from inspect_harbor import aider_polyglot, terminal_bench_2

# Run Aider's Polyglot coding benchmark
eval(aider_polyglot(), model="openai/gpt-5-mini")

# Run Terminal-Bench 2.0
eval(terminal_bench_2(), model="openai/gpt-5")

What this does

Loads the dataset from the Harbor registry.
Downloads and caches all tasks in the dataset.
Solves the tasks with the default ReAct agent scaffold.
Executes in a Docker sandbox environment.
Stores results in ./logs.

See the Registry for the full list of available datasets, and the Using Harbor guides for more detail on datasets, task parameters, agents, and advanced features.