sldbench/sldbench
Science
Reasoning
Mathematics
SLDBench: first benchmark for scaling-law discovery — tasks curated from LLM training experiments where agents must autonomously fit and extrapolate scaling laws.
Run this task
CLI:
inspect eval inspect_harbor/sldbench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import sldbench
eval(sldbench(), model="openai/gpt-5")Dataset information
| Harbor registry | sldbench/sldbench |
| Inspect task | sldbench |
| Latest digest | sha256:369ce8a4825cae7cfb75ef0f5886f3081f072123fd37630f6d6aeef8dec46089 |
| Samples | 8 |
| Paper | arxiv |
| Source | https://github.com/linhaowei1/SLD |
See Task Parameters for the parameter set shared across all Harbor tasks.