simpleqa@1.0
Knowledge
SimpleQA: 4,326 short, fact-seeking questions from OpenAI for evaluating language model factuality. Uses LLM-as-a-judge grading.
Run this task
CLI:
inspect eval inspect_harbor/simpleqa_1_0 --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import simpleqa_1_0
eval(simpleqa_1_0(), model="openai/gpt-5")Dataset information
| Harbor registry | simpleqa@1.0 |
| Inspect task | simpleqa_1_0 |
| Version | 1.0 |
| Samples | 4326 |
| Paper | arxiv |
See Task Parameters for the parameter set shared across all Harbor tasks.