strongreject/strongreject

Safeguards

StrongREJECT: forbidden prompts plus an automated evaluator for measuring how effective jailbreaks are at eliciting genuinely harmful, specific responses.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/strongreject --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import strongreject

eval(strongreject(), model="openai/gpt-5")

Dataset information

Harbor registry strongreject/strongreject
Inspect task strongreject
Latest digest sha256:c3d584ac2b1b50436fe5c6e8f99ebef907cf6808747f36371f4975d1b9bc6b2f
Samples 150
Paper arxiv
Source https://github.com/alexandrasouly/strongreject

See Task Parameters for the parameter set shared across all Harbor tasks.