mmtb/multimedia-terminalbench

Coding
Multimodal

MultiMedia-TerminalBench (MMTB): a benchmark of 105 realistic multimedia-file tasks in persistent terminal workspaces, across 5 meta-categories grounded in paid practitioner workflows.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/mmtb_multimedia_terminalbench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import mmtb_multimedia_terminalbench

eval(mmtb_multimedia_terminalbench(), model="openai/gpt-5")

Dataset information

Harbor registry mmtb/multimedia-terminalbench
Inspect task mmtb_multimedia_terminalbench
Latest digest sha256:0cece363cd8d5809bed7b78574c527aec269add8fe312ff917c493cbdf22c0dd
Samples 105
Paper arxiv
Source https://github.com/mm-tbench/multimedia-terminal-bench

See Task Parameters for the parameter set shared across all Harbor tasks.