replicationbench/replicationbench

Science
Physics
Coding

ReplicationBench: end-to-end replication of astrophysics research papers — agents reproduce implementation, methodology, and core findings of expert-validated papers, scored on result accuracy.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/replicationbench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import replicationbench

eval(replicationbench(), model="openai/gpt-5")

Dataset information

Harbor registry replicationbench/replicationbench
Inspect task replicationbench
Latest digest sha256:323a161d95ba985abfe8895fd6ca00bfb4233dad8a08adb6658e696bb2174f53
Samples 90
Paper arxiv
Source https://github.com/Christine8888/replicationbench-release

See Task Parameters for the parameter set shared across all Harbor tasks.