kumo/kumo-hard

Reasoning

KUMO (hard split): hard-difficulty procedurally-generated reasoning tasks from KUMO’s benchmark across 100 domains.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/kumo_hard --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import kumo_hard

eval(kumo_hard(), model="openai/gpt-5")

Dataset information

Harbor registry kumo/kumo-hard
Inspect task kumo_hard
Latest digest sha256:6f175f28349747cc2b018c23e3f60aeafa1ab2c331fc389d69b9308eb68bf458
Samples 250
Paper arxiv
Source https://github.com/linhaowei1/kumo

See Task Parameters for the parameter set shared across all Harbor tasks.