openai/swe-lancer-diamond-ic
Coding
A benchmark of freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. Individual Contributor (IC) variant: end-to-end engineering tasks.
Run this task
CLI:
inspect eval inspect_harbor/openai_swe_lancer_diamond_ic --model openai/gpt-5Python:
from inspect_ai import eval
from inspect_harbor import openai_swe_lancer_diamond_ic
eval(openai_swe_lancer_diamond_ic(), model="openai/gpt-5")Dataset information
| Harbor registry | openai/swe-lancer-diamond-ic |
| Inspect task | openai_swe_lancer_diamond_ic |
| Latest digest | sha256:d0645e1152d417dd3ec8b36c324c03a8729b3fa48c8840f8935f93582c4dce28 |
| Samples | 198 |
| Paper | arxiv |
| Source | https://github.com/openai/SWELancer-Benchmark |
See Task Parameters for the parameter set shared across all Harbor tasks.