gaia/gaia

Assistants
Multimodal

GAIA: real-world questions across three difficulty levels evaluating general AI assistants on reasoning, multimodality, web browsing, and tool use.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/gaia --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import gaia

eval(gaia(), model="openai/gpt-5")

Dataset information

Harbor registry gaia/gaia
Inspect task gaia
Latest digest sha256:bbc356f476e0b70ba77da11a9be7d6345918d1e4a2daade0d6dfb82ee6f7b761
Samples 165
Paper arxiv
Source https://huggingface.co/datasets/gaia-benchmark/GAIA

See Task Parameters for the parameter set shared across all Harbor tasks.