agentic-labs/erp-bench

Professional

ERP-Bench is the Odoo 19 benchmark used in the Anchor paper, “Preventing Artifact Drift in Agent Benchmark Generation.” It contains 300 long-horizon procurement and manufacturing tasks generated from a single CP-SAT-backed specification.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/agentic_labs_erp_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import agentic_labs_erp_bench

eval(agentic_labs_erp_bench(), model="openai/gpt-5")

Dataset information

Harbor registry agentic-labs/erp-bench
Inspect task agentic_labs_erp_bench
Latest digest sha256:d3005dc4ff8a5cdba0efbb46e8dae9af29dc63f58255dd2e8329d4785d311605
Samples 300
Paper arxiv

See Task Parameters for the parameter set shared across all Harbor tasks.