kgmon/deepsearchqa

Knowledge

Reasoning

Assistants

DeepSearchQA is a 900-prompt factuality benchmark from Google DeepMind for evaluating deep research agents on difficult multi-step information-seeking tasks.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/kgmon_deepsearchqa --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import kgmon_deepsearchqa

eval(kgmon_deepsearchqa(), model="openai/gpt-5")

Dataset information

Harbor registry	kgmon/deepsearchqa
Inspect task	`kgmon_deepsearchqa`
Latest digest	sha256:137df55552cd22440b9a1284609980d1e76920350dd1bf9f38aaa4380da57d15
Samples	900
Paper	arxiv
Source	https://huggingface.co/datasets/google/deepsearchqa

See Task Parameters for the parameter set shared across all Harbor tasks.