meta/mlgym-bench

Coding

Science

MLGym-Bench: Meta’s framework and benchmark for AI research agents covering CV, NLP, RL, and game-theory tasks requiring ideation, implementation, training, and analysis of ML experiments.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/meta_mlgym_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import meta_mlgym_bench

eval(meta_mlgym_bench(), model="openai/gpt-5")

Dataset information

Harbor registry	meta/mlgym-bench
Inspect task	`meta_mlgym_bench`
Latest digest	sha256:4637b4f2a71602911c17071a66a039dac70a8b8ce2c582b0e114c9d3adf4b412
Samples	12
Paper	arxiv
Source	https://github.com/facebookresearch/MLGym

See Task Parameters for the parameter set shared across all Harbor tasks.