agentic-labs/erp-bench
Professional
ERP-Bench is the Odoo 19 benchmark used in the Anchor paper, “Preventing Artifact Drift in Agent Benchmark Generation.” It contains 300 long-horizon procurement and manufacturing tasks generated from a single CP-SAT-backed specification.
Run this task
CLI:
inspect eval inspect_harbor/agentic_labs_erp_bench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import agentic_labs_erp_bench
eval(agentic_labs_erp_bench(), model="openai/gpt-5")Dataset information
| Harbor registry | agentic-labs/erp-bench |
| Inspect task | agentic_labs_erp_bench |
| Latest digest | sha256:d3005dc4ff8a5cdba0efbb46e8dae9af29dc63f58255dd2e8329d4785d311605 |
| Samples | 300 |
| Paper | arxiv |
See Task Parameters for the parameter set shared across all Harbor tasks.