MichaelY310/devopsgym

Coding
Professional

DevOps-Gym benchmark adapted to Harbor format - 729 tasks across 5 categories: Build, Monitoring, Issue Resolving, Test Generation, and End-to-End.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/michaely310_devopsgym --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import michaely310_devopsgym

eval(michaely310_devopsgym(), model="openai/gpt-5")

Dataset information

Harbor registry MichaelY310/devopsgym
Inspect task michaely310_devopsgym
Latest digest sha256:cab4ecd10b85e8b8ae13bcedf4bafe3ba4f6816fbef77a6c8c7b3443c01f6a03
Samples 728
Paper arxiv
Source https://github.com/ucsb-mlsec/DevOps-Gym

See Task Parameters for the parameter set shared across all Harbor tasks.