bird-bench@parity
Coding
BIRD SQL parity subset (150 tasks, seed 42).
Run this task
CLI:
inspect eval inspect_harbor/bird_bench_parity --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import bird_bench_parity
eval(bird_bench_parity(), model="openai/gpt-5")Dataset information
| Harbor registry | bird-bench@parity |
| Inspect task | bird_bench_parity |
| Version | parity |
| Samples | 150 |
| Paper | arxiv |
See Task Parameters for the parameter set shared across all Harbor tasks.