gaia/gaia
Assistants
Multimodal
GAIA: real-world questions across three difficulty levels evaluating general AI assistants on reasoning, multimodality, web browsing, and tool use.
Run this task
CLI:
inspect eval inspect_harbor/gaia --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import gaia
eval(gaia(), model="openai/gpt-5")Dataset information
| Harbor registry | gaia/gaia |
| Inspect task | gaia |
| Latest digest | sha256:bbc356f476e0b70ba77da11a9be7d6345918d1e4a2daade0d6dfb82ee6f7b761 |
| Samples | 165 |
| Paper | arxiv |
| Source | https://huggingface.co/datasets/gaia-benchmark/GAIA |
See Task Parameters for the parameter set shared across all Harbor tasks.