openai/simpleqa

Knowledge

SimpleQA: short, fact-seeking questions adversarially collected against GPT-4 to measure short-form factuality and calibration of frontier LLMs.

Run this task

CLI:

inspect eval inspect_harbor/openai_simpleqa --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import openai_simpleqa

eval(openai_simpleqa(), model="openai/gpt-5")

Harbor registry	openai/simpleqa
Inspect task	`openai_simpleqa`
Latest digest	sha256:22f25921ded881aca13cf5d18b8d3bbc91e2b9bf44d17108292dcc40fcb5f0d4
Samples	4326
Paper	arxiv
Source	https://github.com/openai/simple-evals

See Task Parameters for the parameter set shared across all Harbor tasks.