actava-ai/chi-bench
Medicine
Professional
χ-Bench: long-horizon, policy-rich U.S. healthcare workflow agent benchmark spanning provider prior-authorization, payer utilization management, and care management (78 hub tasks).
Run this task
CLI:
inspect eval inspect_harbor/actava_ai_chi_bench --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import actava_ai_chi_bench
eval(actava_ai_chi_bench(), model="openai/gpt-5")Dataset information
| Harbor registry | actava-ai/chi-bench |
| Inspect task | actava_ai_chi_bench |
| Latest digest | sha256:bd0b20dd71a286d4c4e7b7f17b3047bd9af572bb78bae3091db8f01a9761b325 |
| Samples | 78 |
| Paper | arxiv |
See Task Parameters for the parameter set shared across all Harbor tasks.