maxbittker/runebench
Reasoning
Behavior
Benchmark suite for evaluating AI agents on RuneScape gameplay tasks.
Run this task
CLI:
inspect eval inspect_harbor/maxbittker_runebench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import maxbittker_runebench
eval(maxbittker_runebench(), model="openai/gpt-5")Dataset information
| Harbor registry | maxbittker/runebench |
| Inspect task | maxbittker_runebench |
| Latest digest | sha256:4bb3430af2ef3a320bd3dfeeab2447fbf9e0093452ad747997186a85a060de28 |
| Samples | 32 |
| Source | https://github.com/MaxBittker/rs-sdk |
See Task Parameters for the parameter set shared across all Harbor tasks.