scienceagentbench/scienceagentbench

Science
Coding
Reasoning

ScienceAgentBench: data-driven scientific discovery via Python programs across 4 disciplines.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/scienceagentbench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import scienceagentbench

eval(scienceagentbench(), model="openai/gpt-5")

Dataset information

Harbor registry scienceagentbench/scienceagentbench
Inspect task scienceagentbench
Latest digest sha256:e1b96865e47796cdfa47afae7b9bab3d4cf0cbea7d699e085ce660b716a57041
Samples 102
Paper arxiv
Source https://github.com/OSU-NLP-Group/ScienceAgentBench

See Task Parameters for the parameter set shared across all Harbor tasks.