Inspect Harbor
Harbor is a framework for building, evaluating, and optimizing AI agents in containerized environments. Inspect Harbor provides an interface to run Harbor tasks using Inspect AI.
Installation
Install from PyPI:
pip install inspect-harborOr with uv:
uv add inspect-harborPrerequisites
Before running Harbor tasks, ensure you have:
- Python 3.12 or higher – required by inspect_harbor.
- Docker installed and running – required for execution when using Docker sandbox (default).
- Model API keys – set appropriate environment variables (e.g.
OPENAI_API_KEY,ANTHROPIC_API_KEY).
Quick Start
The fastest way to get started is to run a dataset from the Harbor registry.
CLI:
# Run Aider's Polyglot coding benchmark
inspect eval inspect_harbor/aider_polyglot --model openai/gpt-5-mini
# Run Terminal-Bench 2.0
inspect eval inspect_harbor/terminal_bench_2 --model openai/gpt-5Python API:
from inspect_ai import eval
from inspect_harbor import aider_polyglot, terminal_bench_2
# Run Aider's Polyglot coding benchmark
eval(aider_polyglot(), model="openai/gpt-5-mini")
# Run Terminal-Bench 2.0
eval(terminal_bench_2(), model="openai/gpt-5")What this does
- Loads the dataset from the Harbor registry.
- Downloads and caches all tasks in the dataset.
- Solves the tasks with the default ReAct agent scaffold.
- Executes in a Docker sandbox environment.
- Stores results in
./logs.
See the Registry for the full list of available datasets, and the Using Harbor guides for more detail on datasets, task parameters, agents, and advanced features.