gaia/gaia

Assistants

Multimodal

GAIA: real-world questions across three difficulty levels evaluating general AI assistants on reasoning, multimodality, web browsing, and tool use.

Run this task

CLI:

inspect eval inspect_harbor/gaia --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import gaia

eval(gaia(), model="openai/gpt-5")

Harbor registry	gaia/gaia
Inspect task	`gaia`
Latest digest	sha256:bbc356f476e0b70ba77da11a9be7d6345918d1e4a2daade0d6dfb82ee6f7b761
Samples	165
Paper	arxiv
Source	https://huggingface.co/datasets/gaia-benchmark/GAIA

See Task Parameters for the parameter set shared across all Harbor tasks.