kgmon/deepsearchqa

Knowledge
Reasoning
Assistants

DeepSearchQA is a 900-prompt factuality benchmark from Google DeepMind for evaluating deep research agents on difficult multi-step information-seeking tasks.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/kgmon_deepsearchqa --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import kgmon_deepsearchqa

eval(kgmon_deepsearchqa(), model="openai/gpt-5")

Dataset information

Harbor registry kgmon/deepsearchqa
Inspect task kgmon_deepsearchqa
Latest digest sha256:137df55552cd22440b9a1284609980d1e76920350dd1bf9f38aaa4380da57d15
Samples 900
Paper arxiv
Source https://huggingface.co/datasets/google/deepsearchqa

See Task Parameters for the parameter set shared across all Harbor tasks.