satbench/satbench
Reasoning
SATBench: logical-reasoning puzzles automatically generated from SAT formulas with adjustable difficulty, validated through both LLM and SAT-solver consistency checks.
Run this task
CLI:
inspect eval inspect_harbor/satbench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import satbench
eval(satbench(), model="openai/gpt-5")Dataset information
| Harbor registry | satbench/satbench |
| Inspect task | satbench |
| Latest digest | sha256:4b921bb49ebe0513a784783eeac9561e9d216339de1e4cb20c43018dd0502a1e |
| Samples | 1000 |
| Paper | arxiv |
| Source | https://github.com/Anjiang-Wei/SATBench |
See Task Parameters for the parameter set shared across all Harbor tasks.