reasoning-gym/reasoning-gym-hard
Reasoning
Reasoning Gym (hard split): procedurally-generated, algorithmically-verifiable reasoning tasks at harder difficulty across 90+ task families.
Run this task
CLI:
inspect eval inspect_harbor/reasoning_gym_hard --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import reasoning_gym_hard
eval(reasoning_gym_hard(), model="openai/gpt-5")Dataset information
| Harbor registry | reasoning-gym/reasoning-gym-hard |
| Inspect task | reasoning_gym_hard |
| Latest digest | sha256:e1bb328c340b122bb398f230ad46697bff8e77ad49ed3923a9caecaa3d6c8611 |
| Samples | 288 |
| Paper | arxiv |
| Source | https://github.com/open-thought/reasoning-gym |
See Task Parameters for the parameter set shared across all Harbor tasks.