replicationbench/replicationbench

Science

Physics

Coding

ReplicationBench: end-to-end replication of astrophysics research papers — agents reproduce implementation, methodology, and core findings of expert-validated papers, scored on result accuracy.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/replicationbench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import replicationbench

eval(replicationbench(), model="openai/gpt-5")

Dataset information

Harbor registry	replicationbench/replicationbench
Inspect task	`replicationbench`
Latest digest	sha256:323a161d95ba985abfe8895fd6ca00bfb4233dad8a08adb6658e696bb2174f53
Samples	90
Paper	arxiv
Source	https://github.com/Christine8888/replicationbench-release

See Task Parameters for the parameter set shared across all Harbor tasks.