mmtb/multimedia-terminalbench
Coding
Multimodal
MultiMedia-TerminalBench (MMTB): a benchmark of 105 realistic multimedia-file tasks in persistent terminal workspaces, across 5 meta-categories grounded in paid practitioner workflows.
Run this task
CLI:
inspect eval inspect_harbor/mmtb_multimedia_terminalbench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import mmtb_multimedia_terminalbench
eval(mmtb_multimedia_terminalbench(), model="openai/gpt-5")Dataset information
| Harbor registry | mmtb/multimedia-terminalbench |
| Inspect task | mmtb_multimedia_terminalbench |
| Latest digest | sha256:0cece363cd8d5809bed7b78574c527aec269add8fe312ff917c493cbdf22c0dd |
| Samples | 105 |
| Paper | arxiv |
| Source | https://github.com/mm-tbench/multimedia-terminal-bench |
See Task Parameters for the parameter set shared across all Harbor tasks.