minnesotanlp/aar

Reasoning
Assistants

The Amazing Agent Race (AAR): 1400 multi-step scavenger-hunt puzzles for evaluating LLM agents on tool use, web navigation, and arithmetic reasoning. Includes linear (800) and DAG (600) variants across 4 difficulty levels.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/minnesotanlp_aar --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import minnesotanlp_aar

eval(minnesotanlp_aar(), model="openai/gpt-5")

Dataset information

Harbor registry minnesotanlp/aar
Inspect task minnesotanlp_aar
Latest digest sha256:d93938542f547046aa37d7c62f8ef0e4ec690cc18860615d72ef03e142bb5403
Samples 1000
Paper arxiv
Source https://github.com/minnesotanlp/the-amazing-agent-race

See Task Parameters for the parameter set shared across all Harbor tasks.