harbor/rewardhackbench

Safeguards

RewardHackBench: judge benchmark for detecting reward hacking in agent trajectories — each trace is labelled for harness-level cheating, task-level reward hacking, and refusals.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/harbor_rewardhackbench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import harbor_rewardhackbench

eval(harbor_rewardhackbench(), model="openai/gpt-5")

Dataset information

Harbor registry	harbor/rewardhackbench
Inspect task	`harbor_rewardhackbench`
Latest digest	sha256:0dd16e1029495cba180809b7ecfbae375089881b11ff11e369bfbbf3c72a2fd8
Samples	846

See Task Parameters for the parameter set shared across all Harbor tasks.