MichaelY310/devopsgym
Coding
Professional
DevOps-Gym benchmark adapted to Harbor format - 729 tasks across 5 categories: Build, Monitoring, Issue Resolving, Test Generation, and End-to-End.
Run this task
CLI:
inspect eval inspect_harbor/michaely310_devopsgym --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import michaely310_devopsgym
eval(michaely310_devopsgym(), model="openai/gpt-5")Dataset information
| Harbor registry | MichaelY310/devopsgym |
| Inspect task | michaely310_devopsgym |
| Latest digest | sha256:cab4ecd10b85e8b8ae13bcedf4bafe3ba4f6816fbef77a6c8c7b3443c01f6a03 |
| Samples | 728 |
| Paper | arxiv |
| Source | https://github.com/ucsb-mlsec/DevOps-Gym |
See Task Parameters for the parameter set shared across all Harbor tasks.