meta/mlgym-bench
Coding
Science
MLGym-Bench: Meta’s framework and benchmark for AI research agents covering CV, NLP, RL, and game-theory tasks requiring ideation, implementation, training, and analysis of ML experiments.
Run this task
CLI:
inspect eval inspect_harbor/meta_mlgym_bench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import meta_mlgym_bench
eval(meta_mlgym_bench(), model="openai/gpt-5")Dataset information
| Harbor registry | meta/mlgym-bench |
| Inspect task | meta_mlgym_bench |
| Latest digest | sha256:4637b4f2a71602911c17071a66a039dac70a8b8ce2c582b0e114c9d3adf4b412 |
| Samples | 12 |
| Paper | arxiv |
| Source | https://github.com/facebookresearch/MLGym |
See Task Parameters for the parameter set shared across all Harbor tasks.