quesma/otel-bench
Coding
AI-agent benchmark for OpenTelemetry instrumentation tasks across multiple programming languages.
Run this task
CLI:
inspect eval inspect_harbor/quesma_otel_bench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import quesma_otel_bench
eval(quesma_otel_bench(), model="openai/gpt-5")Dataset information
| Harbor registry | quesma/otel-bench |
| Inspect task | quesma_otel_bench |
| Latest digest | sha256:a6ca75f833dedb831238b42c5dccab7f4d95713db9f6933560a6cca2c052b4b9 |
| Samples | 26 |
| Source | https://github.com/QuesmaOrg/otel-bench |
See Task Parameters for the parameter set shared across all Harbor tasks.