agentic-labs/erp-bench

Professional

ERP-Bench is the Odoo 19 benchmark used in the Anchor paper, “Preventing Artifact Drift in Agent Benchmark Generation.” It contains 300 long-horizon procurement and manufacturing tasks generated from a single CP-SAT-backed specification.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/agentic_labs_erp_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import agentic_labs_erp_bench

eval(agentic_labs_erp_bench(), model="openai/gpt-5")

Dataset information

Harbor registry	agentic-labs/erp-bench
Inspect task	`agentic_labs_erp_bench`
Latest digest	sha256:d3005dc4ff8a5cdba0efbb46e8dae9af29dc63f58255dd2e8329d4785d311605
Samples	300
Paper	arxiv

See Task Parameters for the parameter set shared across all Harbor tasks.