kumo/kumo-hard

Reasoning

KUMO (hard split): hard-difficulty procedurally-generated reasoning tasks from KUMO’s benchmark across 100 domains.

Run this task

CLI:

inspect eval inspect_harbor/kumo_hard --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import kumo_hard

eval(kumo_hard(), model="openai/gpt-5")

Harbor registry	kumo/kumo-hard
Inspect task	`kumo_hard`
Latest digest	sha256:6f175f28349747cc2b018c23e3f60aeafa1ab2c331fc389d69b9308eb68bf458
Samples	250
Paper	arxiv
Source	https://github.com/linhaowei1/kumo

See Task Parameters for the parameter set shared across all Harbor tasks.