webgen-bench/webgen-bench

Coding

WebGen-Bench: evaluating LLMs on generating interactive and functional websites from scratch.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/webgen_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import webgen_bench

eval(webgen_bench(), model="openai/gpt-5")

Dataset information

Harbor registry webgen-bench/webgen-bench
Inspect task webgen_bench
Latest digest sha256:e593e93b325f9942ccae818c2a5d4adedbd837ac2aad96c6c3e3fe623be29374
Samples 101
Paper arxiv
Source https://github.com/mnluzimu/WebGen-Bench

See Task Parameters for the parameter set shared across all Harbor tasks.