openai/simpleqa
Knowledge
SimpleQA: short, fact-seeking questions adversarially collected against GPT-4 to measure short-form factuality and calibration of frontier LLMs.
Run this task
CLI:
inspect eval inspect_harbor/openai_simpleqa --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import openai_simpleqa
eval(openai_simpleqa(), model="openai/gpt-5")Dataset information
| Harbor registry | openai/simpleqa |
| Inspect task | openai_simpleqa |
| Latest digest | sha256:22f25921ded881aca13cf5d18b8d3bbc91e2b9bf44d17108292dcc40fcb5f0d4 |
| Samples | 1000 |
| Paper | arxiv |
| Source | https://github.com/openai/simple-evals |
See Task Parameters for the parameter set shared across all Harbor tasks.