apple/mmau
Assistants
Coding
Mathematics
Reasoning
MMAU (Massive Multitask Agent Understanding): Apple’s holistic agent benchmark covering tool-use, DAG QA, data science/ML coding, contest programming, and mathematics.
Run this task
CLI:
inspect eval inspect_harbor/apple_mmau --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import apple_mmau
eval(apple_mmau(), model="openai/gpt-5")Dataset information
| Harbor registry | apple/mmau |
| Inspect task | apple_mmau |
| Latest digest | sha256:435e5f12af62d3a7537608bdc6652757c58488fcc3345e00cc0dcc0340c72417 |
| Samples | 1000 |
| Paper | arxiv |
| Source | https://github.com/apple/axlearn/tree/main/docs/research/mmau |
See Task Parameters for the parameter set shared across all Harbor tasks.