mmmlu@parity

Knowledge
Reasoning

MMMLU (Multilingual MMLU) parity validation subset with 10 tasks per language across 15 languages (150 tasks total). Evaluates language models’ subject knowledge and reasoning across multiple languages using multiple-choice questions covering 57 academic subjects.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/mmmlu_parity --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import mmmlu_parity

eval(mmmlu_parity(), model="openai/gpt-5")

Dataset information

Harbor registry mmmlu@parity
Inspect task mmmlu_parity
Version parity
Samples 150
Paper arxiv

See Task Parameters for the parameter set shared across all Harbor tasks.