kgmon/deepsearchqa
Knowledge
Reasoning
Assistants
DeepSearchQA is a 900-prompt factuality benchmark from Google DeepMind for evaluating deep research agents on difficult multi-step information-seeking tasks.
Run this task
CLI:
inspect eval inspect_harbor/kgmon_deepsearchqa --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import kgmon_deepsearchqa
eval(kgmon_deepsearchqa(), model="openai/gpt-5")Dataset information
| Harbor registry | kgmon/deepsearchqa |
| Inspect task | kgmon_deepsearchqa |
| Latest digest | sha256:137df55552cd22440b9a1284609980d1e76920350dd1bf9f38aaa4380da57d15 |
| Samples | 900 |
| Paper | arxiv |
| Source | https://huggingface.co/datasets/google/deepsearchqa |
See Task Parameters for the parameter set shared across all Harbor tasks.