apple/mmau

Assistants
Coding
Mathematics
Reasoning

MMAU (Massive Multitask Agent Understanding): Apple’s holistic agent benchmark covering tool-use, DAG QA, data science/ML coding, contest programming, and mathematics.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/apple_mmau --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import apple_mmau

eval(apple_mmau(), model="openai/gpt-5")

Dataset information

Harbor registry apple/mmau
Inspect task apple_mmau
Latest digest sha256:435e5f12af62d3a7537608bdc6652757c58488fcc3345e00cc0dcc0340c72417
Samples 1000
Paper arxiv
Source https://github.com/apple/axlearn/tree/main/docs/research/mmau

See Task Parameters for the parameter set shared across all Harbor tasks.