bigcode/bigcodebench-hard-complete
Coding
BigCodeBench-Hard (Complete split): hard subset evaluating LLMs on code generation with diverse function calls and complex instructions, in completion format.
Run this task
CLI:
inspect eval inspect_harbor/bigcode_bigcodebench_hard_complete --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import bigcode_bigcodebench_hard_complete
eval(bigcode_bigcodebench_hard_complete(), model="openai/gpt-5")Dataset information
| Harbor registry | bigcode/bigcodebench-hard-complete |
| Inspect task | bigcode_bigcodebench_hard_complete |
| Latest digest | sha256:4c881f46251c98f6af182ec8eedbacc2c144db0761fcbbd789a60d7e69c30f69 |
| Samples | 145 |
| Paper | arxiv |
| Source | https://github.com/bigcode-project/bigcodebench |
See Task Parameters for the parameter set shared across all Harbor tasks.