theagentcompany/theagentcompany
Professional
Assistants
Coding
An agent benchmark with tasks in a simulated software company across GitLab, Plane, OwnCloud, and RocketChat services, evaluating LLM agents on real-world professional work.
Run this task
CLI:
inspect eval inspect_harbor/theagentcompany --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import theagentcompany
eval(theagentcompany(), model="openai/gpt-5")Dataset information
| Harbor registry | theagentcompany/theagentcompany |
| Inspect task | theagentcompany |
| Latest digest | sha256:f31a4e945df5e24664acd8170637fdc774b6d593aac6274a1580fc7f2bb76cd0 |
| Samples | 174 |
| Paper | arxiv |
| Source | https://github.com/TheAgentCompany/TheAgentCompany |
See Task Parameters for the parameter set shared across all Harbor tasks.