reasoning-gym/reasoning-gym-hard

Reasoning

Reasoning Gym (hard split): procedurally-generated, algorithmically-verifiable reasoning tasks at harder difficulty across 90+ task families.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/reasoning_gym_hard --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import reasoning_gym_hard

eval(reasoning_gym_hard(), model="openai/gpt-5")

Dataset information

Harbor registry reasoning-gym/reasoning-gym-hard
Inspect task reasoning_gym_hard
Latest digest sha256:e1bb328c340b122bb398f230ad46697bff8e77ad49ed3923a9caecaa3d6c8611
Samples 288
Paper arxiv
Source https://github.com/open-thought/reasoning-gym

See Task Parameters for the parameter set shared across all Harbor tasks.