scienceagentbench/scienceagentbench
Science
Coding
Reasoning
ScienceAgentBench: data-driven scientific discovery via Python programs across 4 disciplines.
Run this task
CLI:
inspect eval inspect_harbor/scienceagentbench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import scienceagentbench
eval(scienceagentbench(), model="openai/gpt-5")Dataset information
| Harbor registry | scienceagentbench/scienceagentbench |
| Inspect task | scienceagentbench |
| Latest digest | sha256:e1b96865e47796cdfa47afae7b9bab3d4cf0cbea7d699e085ce660b716a57041 |
| Samples | 102 |
| Paper | arxiv |
| Source | https://github.com/OSU-NLP-Group/ScienceAgentBench |
See Task Parameters for the parameter set shared across all Harbor tasks.