strongreject/strongreject
Safeguards
StrongREJECT: forbidden prompts plus an automated evaluator for measuring how effective jailbreaks are at eliciting genuinely harmful, specific responses.
Run this task
CLI:
inspect eval inspect_harbor/strongreject --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import strongreject
eval(strongreject(), model="openai/gpt-5")Dataset information
| Harbor registry | strongreject/strongreject |
| Inspect task | strongreject |
| Latest digest | sha256:c3d584ac2b1b50436fe5c6e8f99ebef907cf6808747f36371f4975d1b9bc6b2f |
| Samples | 150 |
| Paper | arxiv |
| Source | https://github.com/alexandrasouly/strongreject |
See Task Parameters for the parameter set shared across all Harbor tasks.