minnesotanlp/aar
Reasoning
Assistants
The Amazing Agent Race (AAR): 1400 multi-step scavenger-hunt puzzles for evaluating LLM agents on tool use, web navigation, and arithmetic reasoning. Includes linear (800) and DAG (600) variants across 4 difficulty levels.
Run this task
CLI:
inspect eval inspect_harbor/minnesotanlp_aar --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import minnesotanlp_aar
eval(minnesotanlp_aar(), model="openai/gpt-5")Dataset information
| Harbor registry | minnesotanlp/aar |
| Inspect task | minnesotanlp_aar |
| Latest digest | sha256:d93938542f547046aa37d7c62f8ef0e4ec690cc18860615d72ef03e142bb5403 |
| Samples | 1000 |
| Paper | arxiv |
| Source | https://github.com/minnesotanlp/the-amazing-agent-race |
See Task Parameters for the parameter set shared across all Harbor tasks.