kumo@hard
Reasoning
KUMO(hard) split (250 tasks; 50 instances per scenario).
Run this task
CLI:
inspect eval inspect_harbor/kumo_hard --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import kumo_hard
eval(kumo_hard(), model="openai/gpt-5")Dataset information
| Harbor registry | kumo@hard |
| Inspect task | kumo_hard |
| Version | hard |
| Samples | 250 |
See Task Parameters for the parameter set shared across all Harbor tasks.