mmmlu@parity
Knowledge
Reasoning
MMMLU (Multilingual MMLU) parity validation subset with 10 tasks per language across 15 languages (150 tasks total). Evaluates language models’ subject knowledge and reasoning across multiple languages using multiple-choice questions covering 57 academic subjects.
Run this task
CLI:
inspect eval inspect_harbor/mmmlu_parity --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import mmmlu_parity
eval(mmmlu_parity(), model="openai/gpt-5")Dataset information
| Harbor registry | mmmlu@parity |
| Inspect task | mmmlu_parity |
| Version | parity |
| Samples | 150 |
| Paper | arxiv |
See Task Parameters for the parameter set shared across all Harbor tasks.