tencent/autocodebench

Coding

Multilingual automated code generation benchmark evaluating LLMs across diverse programming tasks and languages.

Run this task

CLI:

inspect eval inspect_harbor/tencent_autocodebench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import tencent_autocodebench

eval(tencent_autocodebench(), model="openai/gpt-5")

Harbor registry	tencent/autocodebench
Inspect task	`tencent_autocodebench`
Latest digest	sha256:da30a5e97eeccc2d024a2ff947fb99966ea88bed5b7077ee451d2ae72e645caf
Samples	200
Paper	arxiv
Source	https://github.com/Tencent-Hunyuan/AutoCodeBenchmark

See Task Parameters for the parameter set shared across all Harbor tasks.