bigcode/bigcodebench-hard-complete

Coding

BigCodeBench-Hard (Complete split): hard subset evaluating LLMs on code generation with diverse function calls and complex instructions, in completion format.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/bigcode_bigcodebench_hard_complete --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import bigcode_bigcodebench_hard_complete

eval(bigcode_bigcodebench_hard_complete(), model="openai/gpt-5")

Dataset information

Harbor registry bigcode/bigcodebench-hard-complete
Inspect task bigcode_bigcodebench_hard_complete
Latest digest sha256:4c881f46251c98f6af182ec8eedbacc2c144db0761fcbbd789a60d7e69c30f69
Samples 145
Paper arxiv
Source https://github.com/bigcode-project/bigcodebench

See Task Parameters for the parameter set shared across all Harbor tasks.