sierra-research/tau3-bench

Assistants
Professional
Behavior

Third generation of τ-bench, extending the original with knowledge and voice. A simulation framework for evaluating customer service agents across airline, retail, telecom, and banking knowledge domains.

← Back to Registry

Run this task

CLI:

inspect eval inspect_harbor/sierra_research_tau3_bench --model openai/gpt-5

Python:

from inspect_ai import eval
from inspect_harbor import sierra_research_tau3_bench

eval(sierra_research_tau3_bench(), model="openai/gpt-5")

Dataset information

Harbor registry sierra-research/tau3-bench
Inspect task sierra_research_tau3_bench
Latest digest sha256:a57304f682894ac061090769af771a3617664f3ff6e5417d4eadf8e30433e4d9
Samples 375
Paper arxiv
Source https://github.com/sierra-research/tau2-bench

See Task Parameters for the parameter set shared across all Harbor tasks.