meta/mlgym-bench

Coding
Science

MLGym-Bench: Meta’s framework and benchmark for AI research agents covering CV, NLP, RL, and game-theory tasks requiring ideation, implementation, training, and analysis of ML experiments.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/meta_mlgym_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import meta_mlgym_bench

eval(meta_mlgym_bench(), model="openai/gpt-5")

Dataset information

Harbor registry meta/mlgym-bench
Inspect task meta_mlgym_bench
Latest digest sha256:4637b4f2a71602911c17071a66a039dac70a8b8ce2c582b0e114c9d3adf4b412
Samples 12
Paper arxiv
Source https://github.com/facebookresearch/MLGym

See Task Parameters for the parameter set shared across all Harbor tasks.