quesma/otel-bench

Coding

AI-agent benchmark for OpenTelemetry instrumentation tasks across multiple programming languages.

Run this task

CLI:

inspect eval inspect_harbor/quesma_otel_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import quesma_otel_bench

eval(quesma_otel_bench(), model="openai/gpt-5")

Harbor registry	quesma/otel-bench
Inspect task	`quesma_otel_bench`
Latest digest	sha256:a6ca75f833dedb831238b42c5dccab7f4d95713db9f6933560a6cca2c052b4b9
Samples	26
Source	https://github.com/QuesmaOrg/otel-bench

See Task Parameters for the parameter set shared across all Harbor tasks.