# Inspect SWE

> Software engineering agents for Inspect AI.


# Inspect SWE

## Overview

The `inspect_swe` package makes software engineering agents like [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview), [Codex CLI](https://github.com/openai/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), [OpenCode](https://github.com/anomalyco/opencode), and [Mini SWE Agent](https://github.com/SWE-agent/mini-swe-agent). available as standard Inspect agents. For example, here we use the [claude_code()](./reference/index.html.md#claude_code) agent as the solver in an Inspect task:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import claude_code

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=claude_code(),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

Inspect SWE agents are implemented using the Inspect [`sandbox_agent_bridge()`](https://inspect.aisi.org.uk/agent-bridge.html#sandbox-bridge).

Agents run inside the sample sandbox and their model API calls are proxied back to Inspect. This means that you can use any model with Inspect SWE agents, and that features like token or time limits and log transcripts work as normal with the agents.

## Getting Started

Install Inspect SWE from PyPI with:

``` bash
pip install inspect-swe
```

Then, try out one or more of the available agents:

| Agent | Description |
|----|----|
| [claude_code()](./claude_code.html.md) | Anthropic’s agentic coding tool [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) |
| [codex_cli()](./codex_cli.html.md) | OpenAI’s terminal-based coding agent [Codex CLI](https://github.com/openai/codex) |
| [gemini_cli()](./gemini_cli.html.md) | Google’s open-source AI agent [Gemini CLI](https://github.com/google-gemini/gemini-cli) |
| [opencode()](./opencode.html.md) | Provider-independent terminal-based coding agent. |
| [mini_swe_agent()](./mini_swe_agent.html.md) | SWE-agent’s minimal 100-line agent. |


# Claude Code – Inspect SWE

## Overview

The `claude_code()` agent uses the unattended mode of Anthropic [Claude Code](https://code.claude.com/docs/en/overview) to execute agentic tasks within the Inspect sandbox. Model API calls that occur in the sandbox are proxied back to Inspect for handling by the model provider for the current task.

> **NOTE: NoteClaude Code Installation**
>
> By default, the agent will download the current stable version of Claude Code and copy it to the sandbox. You can also exercise more explicit control over which version of Claude Code is used—see the [Installation](#installation) section below for details.

## Basic Usage

Use the `claude_code()` agent as you would any Inspect agent. For example, here we use it as the solver in an Inspect task:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import claude_code

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=claude_code(),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also pass the agent as a `--solver` on the command line:

``` bash
inspect eval ctf.py --solver inspect_swe/claude_code
```

If you want to try this out locally, see the [system_explorer](https://github.com/meridianlabs-ai/inspect_swe/tree/main/examples/system_explorer/task.py) example.

## Options

The following options are supported for customizing the behavior of the agent:

| Option | Description |
|----|----|
| `system_prompt` | Additional system prompt to append to default system prompt. |
| `skills` | Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent. |
| `mcp_servers` | MCP servers (see [MCP Servers](#mcp-servers) below for details). |
| `bridged_tools` | Host-side Inspect tools to expose via MCP (see [Bridged Tools](#bridged-tools) below for details). |
| `disallowed_tools` | Optionally disallow tools (e.g. `"WebSearch"`) |
| `centaur` | Run in [Centaur Mode](#centaur-mode), which makes Claude Code available to an Inspect `human_cli()` agent rather than running it unattended. |
| `attempts` | Allow the agent to have multiple scored attempts at solving the task. |
| `model` | Model name to use for agent (defaults to main model for task). |
| `model_config` | Model id used for the identity the agent presents to itself (its “You are powered by the model …” system prompt). Defaults to the real served model. |
| `opus_model` | The model to use for `opus`, or for `opusplan` when Plan Mode is active. Defaults to `model`. |
| `sonnet_model` | The model to use for `sonnet`, or for `opusplan` when Plan Mode is not active. Defaults to `model`. |
| `haiku_model` | The model to use for haiku, or [background functionality](https://code.claude.com/docs/en/costs#background-token-usage). Defaults to `model`. |
| `subagent_model` | The model to use for [subagents](https://code.claude.com/docs/en/sub-agents). Defaults to `model`. |
| `filter` | Filter for intercepting bridged model requests. |
| `retry_refusals` | Should refusals be retried? Defaults to 3. |
| `retry_uncaught_errors` | Should uncaught errors (unexpected crashes of Claude Code) be retried? Defaults to 3. |
| `cwd` | Working directory for Claude Code session. |
| `env` | Environment variables to set for Claude Code. |
| `version` | Version of Claude Code to use (see [Installation](#installation) below for details) |

For example, here we specify a custom system prompt and disallow the `WebSearch` tool:

``` python
claude_code(
    system_prompt="You are an ace system researcher.",
    disallowed_tools=["WebSearch"]
)
```

## MCP Servers

You can specify one or more [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro) (MCP) servers to provide additional tools to Claude Code. Servers are specified via the [`MCPServerConfig`](https://inspect.aisi.org.uk/reference/inspect_ai.tool.html#mcpserverconfig) class and its Stdio and HTTP variants.

For example, here is a Dockerfile that makes the `server-memory` MCP server available in the sandbox container:

``` dockerfile
FROM python:3.12-bookworm

# nodejs (required by mcp server)
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
    && apt-get install -y --no-install-recommends nodejs \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# memory mcp server
RUN npx --yes @modelcontextprotocol/server-memory --version

# run forever
CMD ["tail", "-f", "/dev/null"]
```

Note that we run the `npx` server during the build of the Dockerfile so that it is cached for use offline (below we’ll run it with the `--offline` option).

We can then use this MCP server in a task as follows:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.tool import MCPServerConfigStdio
from inspect_swe import claude_code

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(
                input="What transport protocols are supported in "
                + " the 2025-03-26 version of the MCP spec?"
            )
        ],
        solver=claude_code(
            system_prompt="Please use the web search tool to "
            + "research this question and the memory tools "
            + "to keep track of your research.",
            mcp_servers=[
                MCPServerConfigStdio(
                    name="memory",
                    command="npx",
                    args=[
                        "--offline",
                        "@modelcontextprotocol/server-memory"
                    ],
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

Note that we run the MCP server using the `--offline` option so that it doesn’t require an internet connection (which it would normally use to check for updates to the package).

## Bridged Tools

You can expose host-side Inspect tools to the sandboxed agent via the MCP protocol using the `bridged_tools` parameter. This allows you to run tools on the host (e.g. tools that access host resources, databases, or APIs) but make them available to the agent running inside the sandbox.

Tools are specified via [`BridgedToolsSpec`](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#bridgedtoolsspec) which wraps a list of Inspect tools:

``` python
from inspect_ai import Task, task
from inspect_ai.agent import BridgedToolsSpec
from inspect_ai.dataset import Sample
from inspect_ai.tool import tool
from inspect_swe import claude_code

@tool
def search_database():
    async def execute(query: str) -> str:
        """Search the internal database.

        Args:
            query: The search query.
        """
        # This runs on the host, not in the sandbox
        return f"Results for: {query}"
    return execute

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(input="Search for information about MCP protocols.")
        ],
        solver=claude_code(
            system_prompt="Use the search tool to research.",
            bridged_tools=[
                BridgedToolsSpec(
                    name="host_tools",
                    tools=[search_database()]
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

The `name` field identifies the MCP server and will be visible to the agent as a tool prefix. You can specify multiple `BridgedToolsSpec` instances to create separate MCP servers for different tool groups.

See the [Bridged Tools](https://inspect.aisi.org.uk/agent-bridge.html#bridged-tools) documentation for more details on the architecture and how tool execution flows between host and sandbox.

## Installation

By default, the agent will download the current stable version of Claude Code and copy it to the sandbox. You can override this behaviour using the `version` option:

| Option | Description |
|----|----|
| `"auto"` | Use any available version of Claude Code in the sandbox, otherwise download the current stable version. |
| `"sandbox"` | Use the version of Claude Code in the sandbox (raises `RuntimeError` if not available in the sandbox) |
| `"stable"` | Download and use the current stable version. |
| `"latest"` | Download and use the very latest version. |
| `"x.x.x"` | Download and use a specific version number. |

If you don’t ever want to rely on automatic downloads of Claude Code (e.g. if you run your evaluations offline), you can use one of two approaches:

1.  Pre-install the version of Claude Code you want to use in the sandbox, then use `version="sandbox"`:

    ``` python
    claude_code(version="sandbox")
    ```

2.  Download the version of Claude Code you want to use into the cache, then specify that version explicitly:

    ``` python
    # download the agent binary during installation/configuration
    download_agent_binary("claude_code", "0.29.0", "linux-x64")

    # reference that version in your task (no download will occur)
    claude_code(version="0.29.0")
    ```

    Note that the 5 most recently downloaded versions are retained in the cache. Use the [cached_agent_binaries()](./reference/index.html.md#cached_agent_binaries) function to list the contents of the cache.

## Centaur Mode

The `claude_code()` agent can also be run in “centaur” mode which uses the Inspect AI [Human Agent](https://inspect.aisi.org.uk/human-agent.html) as the solver and makes [Claude Code](https://code.claude.com/docs/en/overview) available to the human user for help with the task. So rather than strictly measuring human vs. model performance, you are able to measure performance of humans working collaboratively with a model.

Enable centaur mode by passing `centaur=True` to the `claude_code()` agent:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import claude_code

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=claude_code(centaur=True),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also enable centaur mode from the CLI using a solver arg (`-S`):

``` bash
inspect eval ctf.py --solver inspect_swe/claude_code -S centaur=true
```

You can also pass `CentaurOptions` to further customize the behavior of the human agent. For example:

``` python
from inspect_swe import CentaurOptions

Task(
    dataset=json_dataset("dataset.json"),
    solver=claude_code(centaur=CentaurOptions(answer=False)),
    scorer=model_graded_qa(),
    sandbox="docker",
)
```

See the [human_cli()](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#human_cli) documentation for details on available options.

## Troubleshooting

If Claude Code doesn’t appear to be working or working as expected, you can troubleshoot by dumping the Claude Code debug log after an evaluation task is complete. You can do this with:

``` bash
inspect trace dump --filter "Claude Code"
```


# Codex CLI – Inspect SWE

## Overview

The `codex_cli()` agent uses the unattended mode of OpenAI [Codex CLI](https://github.com/openai/codex) to execute agentic tasks within the Inspect sandbox. Model API calls that occur in the sandbox are proxied back to Inspect for handling by the model provider for the current task.

> **NOTE: NoteCodex CLI Installation**
>
> By default, the agent will download the current stable version of Codex CLI and copy it to the sandbox. You can also exercise more explicit control over which version of Codex CLI is used—see the [Installation](#installation) section below for details.

## Basic Usage

Use the `codex_cli()` agent as you would any Inspect agent. For example, here we use it as the solver in an Inspect task:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import codex_cli

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=codex_cli(),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also pass the agent as a `--solver` on the command line:

``` bash
inspect eval ctf.py --solver inspect_swe/codex_cli
```

If you want to try this out locally, see the [system_explorer](https://github.com/meridianlabs-ai/inspect_swe/tree/main/examples/system_explorer/task.py) example.

## Options

The following options are supported for customizing the behavior of the agent:

| Option | Description |
|----|----|
| `system_prompt` | Additional system prompt to append to default system prompt. |
| `model_config` | Codex model slug used to select the system prompt and tool set. Defaults to `None`, which derives the slug from the model used by the agent so Codex’s prompt/tooling aligns with what’s actually running. |
| `skills` | Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent. |
| `mcp_servers` | MCP servers (see [MCP Servers](#mcp-servers) below for details). |
| `bridged_tools` | Host-side Inspect tools to expose via MCP (see [Bridged Tools](#bridged-tools) below for details). |
| `web_search` | Web search mode. Use `"live"` for live web search, `"cached"` for cached web search, or `"disabled"` to disable web search. Defaults to `"live"`. |
| `goals` | Enable Codex goal tools. Defaults to `True`. |
| `centaur` | Run in [Centaur Mode](#centaur-mode), which makes Codex CLI available to an Inspect `human_cli()` agent rather than running it unattended. |
| `attempts` | Allow the agent to have multiple scored attempts at solving the task. |
| `model` | Model name to use for agent (defaults to main model for task). |
| `filter` | Filter for intercepting bridged model requests. |
| `retry_refusals` | Should refusals be retried? (pass number of times to retry) |
| `home_dir` | Home directory to use for codex cli. When set, AGENTS.md and the MCP configuration will be written here rather than to .codex |
| `cwd` | Working directory for Codex CLI session. |
| `env` | Environment variables to set for Codex CLI. |
| `version` | Version of Codex CLI to use (see [Installation](#installation) below for details) |
| `config_overrides` | Additional Codex CLI configuration overrides. |

For example, here we specify a custom system prompt and disable the web search and goals tools:

``` python
codex_cli(
    system_prompt="You are an ace system researcher.",
    web_search="disabled",
    goals=False,
)
```

## MCP Servers

You can specify one or more [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro) (MCP) servers to provide additional tools to Codex CLI. Servers are specified via the [`MCPServerConfig`](https://inspect.aisi.org.uk/reference/inspect_ai.tool.html#mcpserverconfig) class and its Stdio and HTTP variants.

For example, here is a Dockerfile that makes the `server-memory` MCP server available in the sandbox container:

``` dockerfile
FROM python:3.12-bookworm

# nodejs (required by mcp server)
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
    && apt-get install -y --no-install-recommends nodejs \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# memory mcp server
RUN npx --yes @modelcontextprotocol/server-memory --version

# run forever
CMD ["tail", "-f", "/dev/null"]
```

Note that we run the `npx` server during the build of the Dockerfile so that it is cached for use offline (below we’ll run it with the `--offline` option).

We can then use this MCP server in a task as follows:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.tool import MCPServerConfigStdio
from inspect_swe import codex_cli

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(
                input="What transport protocols are supported in "
                + " the 2025-03-26 version of the MCP spec?"
            )
        ],
        solver=codex_cli(
            system_prompt="Please use the web search tool to "
            + "research this question and the memory tools "
            + "to keep track of your research.",
            mcp_servers=[
                MCPServerConfigStdio(
                    name="memory",
                    command="npx",
                    args=[
                        "--offline",
                        "@modelcontextprotocol/server-memory"
                    ],
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

Note that we run the MCP server using the `--offline` option so that it doesn’t require an internet connection (which it would normally use to check for updates to the package).

## Bridged Tools

You can expose host-side Inspect tools to the sandboxed agent via the MCP protocol using the `bridged_tools` parameter. This allows you to run tools on the host (e.g. tools that access host resources, databases, or APIs) but make them available to the agent running inside the sandbox.

Tools are specified via [`BridgedToolsSpec`](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#bridgedtoolsspec) which wraps a list of Inspect tools:

``` python
from inspect_ai import Task, task
from inspect_ai.agent import BridgedToolsSpec
from inspect_ai.dataset import Sample
from inspect_ai.tool import tool
from inspect_swe import codex_cli

@tool
def search_database():
    async def execute(query: str) -> str:
        """Search the internal database.

        Args:
            query: The search query.
        """
        # This runs on the host, not in the sandbox
        return f"Results for: {query}"
    return execute

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(input="Search for information about MCP protocols.")
        ],
        solver=codex_cli(
            system_prompt="Use the search tool to research.",
            bridged_tools=[
                BridgedToolsSpec(
                    name="host_tools",
                    tools=[search_database()]
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

The `name` field identifies the MCP server and will be visible to the agent as a tool prefix. You can specify multiple `BridgedToolsSpec` instances to create separate MCP servers for different tool groups.

See the [Bridged Tools](https://inspect.aisi.org.uk/agent-bridge.html#bridged-tools) documentation for more details on the architecture and how tool execution flows between host and sandbox.

## Installation

By default, the agent will download the current stable version of Codex CLI and copy it to the sandbox. You can override this behaviour using the `version` option:

| Option | Description |
|----|----|
| `"auto"` | Use any available version of Codex CLI in the sandbox, otherwise download the latest version. |
| `"sandbox"` | Use the version of Codex CLI in the sandbox (raises `RuntimeError` if not available in the sandbox) |
| `"latest"` | Download and use the very latest version. |
| `"x.x.x"` | Download and use a specific version number. |

If you don’t ever want to rely on automatic downloads of Codex CLI (e.g. if you run your evaluations offline), you can use one of two approaches:

1.  Pre-install the version of Codex CLI you want to use in the sandbox, then use `version="sandbox"`:

    ``` python
    codex_cli(version="sandbox")
    ```

2.  Download the version of Codex CLI you want to use into the cache, then specify that version explicitly:

    ``` python
    # download the agent binary during installation/configuration
    download_agent_binary("codex_cli", "0.29.0", "linux-x64")

    # reference that version in your task (no download will occur)
    codex_cli(version="0.29.0")
    ```

    Note that the 5 most recently downloaded versions are retained in the cache. Use the [cached_agent_binaries()](./reference/index.html.md#cached_agent_binaries) function to list the contents of the cache.

## Centaur Mode

The `codex_cli()` agent can also be run in “centaur” mode which uses the Inspect AI [Human Agent](https://inspect.aisi.org.uk/human-agent.html) as the solver and makes [Codex CLI](https://github.com/openai/codex) available to the human user for help with the task. So rather than strictly measuring human vs. model performance, you are able to measure performance of humans working collaboratively with a model.

Enable centaur mode by passing `centaur=True` to the `codex_cli()` agent:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import codex_cli

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=codex_cli(centaur=True),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also enable centaur mode from the CLI using a solver arg (`-S`):

``` bash
inspect eval ctf.py --solver inspect_swe/codex_cli -S centaur=true
```

You can also pass `CentaurOptions` to further customize the behavior of the human agent. For example:

``` python
from inspect_swe import CentaurOptions

Task(
    dataset=json_dataset("dataset.json"),
    solver=codex_cli(centaur=CentaurOptions(answer=False)),
    scorer=model_graded_qa(),
    sandbox="docker",
)
```

See the [human_cli()](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#human_cli) documentation for details on available options.

## Troubleshooting

If Codex CLI doesn’t appear to be working or working as expected, you can troubleshoot by dumping the Codex CLI debug log after an evaluation task is complete. You can do this with:

``` bash
inspect trace dump --filter "Codex CLI"
```


# Gemini CLI – Inspect SWE

## Overview

The `gemini_cli()` agent uses the unattended mode of Google [Gemini CLI](https://github.com/google-gemini/gemini-cli) to execute agentic tasks within the Inspect sandbox. Model API calls that occur in the sandbox are proxied back to Inspect for handling by the model provider for the current task.

> **NOTE: NoteGemini CLI Installation**
>
> By default, the agent will download the current stable version of Gemini CLI and copy it to the sandbox. You can also exercise more explicit control over which version of Gemini CLI is used—see the [Installation](#installation) section below for details.

## Basic Usage

Use the `gemini_cli()` agent as you would any Inspect agent. For example, here we use it as the solver in an Inspect task:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import gemini_cli

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=gemini_cli(),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also pass the agent as a `--solver` on the command line:

``` bash
inspect eval ctf.py --solver inspect_swe/gemini_cli
```

If you want to try this out locally, see the [system_explorer](https://github.com/meridianlabs-ai/inspect_swe/tree/main/examples/system_explorer/task.py) example.

## Options

The following options are supported for customizing the behavior of the agent:

| Option | Description |
|----|----|
| `system_prompt` | Additional system prompt to append to default system prompt. |
| `skills` | Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent. |
| `mcp_servers` | MCP servers (see [MCP Servers](#mcp-servers) below for details). |
| `bridged_tools` | Host-side Inspect tools to expose via MCP (see [Bridged Tools](#bridged-tools) below for details). |
| `centaur` | Run in [Centaur Mode](#centaur-mode), which makes Gemini CLI available to an Inspect `human_cli()` agent rather than running it unattended. |
| `attempts` | Allow the agent to have multiple scored attempts at solving the task. |
| `model` | Model name to use for agent (defaults to main model for task). |
| `gemini_model` | Gemini model name to pass to CLI. This bypasses the auto-router. Use `"gemini-2.5-pro"` (default) or `"gemini-2.5-flash"`. |
| `filter` | Filter for intercepting bridged model requests. |
| `retry_refusals` | Should refusals be retried? (pass number of times to retry) |
| `cwd` | Working directory for Gemini CLI session. |
| `env` | Environment variables to set for Gemini CLI. |
| `version` | Version of Gemini CLI to use (see [Installation](#installation) below for details) |

For example, here we specify a custom system prompt:

``` python
gemini_cli(
    system_prompt="You are an ace system researcher."
)
```

## MCP Servers

You can specify one or more [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro) (MCP) servers to provide additional tools to Gemini CLI. Servers are specified via the [`MCPServerConfig`](https://inspect.aisi.org.uk/reference/inspect_ai.tool.html#mcpserverconfig) class and its Stdio and HTTP variants.

For example, here is a Dockerfile that makes the `server-memory` MCP server available in the sandbox container:

``` dockerfile
FROM python:3.12-bookworm

# nodejs (required by mcp server)
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
    && apt-get install -y --no-install-recommends nodejs \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# memory mcp server
RUN npx --yes @modelcontextprotocol/server-memory --version

# run forever
CMD ["tail", "-f", "/dev/null"]
```

Note that we run the `npx` server during the build of the Dockerfile so that it is cached for use offline (below we’ll run it with the `--offline` option).

We can then use this MCP server in a task as follows:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.tool import MCPServerConfigStdio
from inspect_swe import gemini_cli

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(
                input="What transport protocols are supported in "
                + " the 2025-03-26 version of the MCP spec?"
            )
        ],
        solver=gemini_cli(
            system_prompt="Please use the web search tool to "
            + "research this question and the memory tools "
            + "to keep track of your research.",
            mcp_servers=[
                MCPServerConfigStdio(
                    name="memory",
                    command="npx",
                    args=[
                        "--offline",
                        "@modelcontextprotocol/server-memory"
                    ],
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

Note that we run the MCP server using the `--offline` option so that it doesn’t require an internet connection (which it would normally use to check for updates to the package).

## Bridged Tools

You can expose host-side Inspect tools to the sandboxed agent via the MCP protocol using the `bridged_tools` parameter. This allows you to run tools on the host (e.g. tools that access host resources, databases, or APIs) but make them available to the agent running inside the sandbox.

Tools are specified via [`BridgedToolsSpec`](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#bridgedtoolsspec) which wraps a list of Inspect tools:

``` python
from inspect_ai import Task, task
from inspect_ai.agent import BridgedToolsSpec
from inspect_ai.dataset import Sample
from inspect_ai.tool import tool
from inspect_swe import gemini_cli

@tool
def search_database():
    async def execute(query: str) -> str:
        """Search the internal database.

        Args:
            query: The search query.
        """
        # This runs on the host, not in the sandbox
        return f"Results for: {query}"
    return execute

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(input="Search for information about MCP protocols.")
        ],
        solver=gemini_cli(
            system_prompt="Use the search tool to research.",
            bridged_tools=[
                BridgedToolsSpec(
                    name="host_tools",
                    tools=[search_database()]
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

The `name` field identifies the MCP server and will be visible to the agent as a tool prefix. You can specify multiple `BridgedToolsSpec` instances to create separate MCP servers for different tool groups.

See the [Bridged Tools](https://inspect.aisi.org.uk/agent-bridge.html#bridged-tools) documentation for more details on the architecture and how tool execution flows between host and sandbox.

## Installation

By default, the agent will download the current stable version of Gemini CLI and copy it to the sandbox. You can override this behaviour using the `version` option:

| Option | Description |
|----|----|
| `"auto"` | Use any available version of Gemini CLI in the sandbox, otherwise download the latest version. |
| `"sandbox"` | Use the version of Gemini CLI in the sandbox (raises `RuntimeError` if not available in the sandbox) |
| `"latest"` | Download and use the very latest version. |
| `"x.x.x-preview.y"` | Download and use a specific version number. |

If you don’t ever want to rely on automatic downloads of Gemini CLI (e.g. if you run your evaluations offline), you can use one of two approaches:

1.  Pre-install the version of Gemini CLI you want to use in the sandbox, then use `version="sandbox"`:

    ``` python
    gemini_cli(version="sandbox")
    ```

2.  Download the version of Gemini CLI you want to use into the cache, then specify that version explicitly:

    ``` python
    # download the agent binary during installation/configuration
    download_agent_binary("gemini_cli", "0.29.0", "linux-x64")

    # reference that version in your task (no download will occur)
    gemini_cli(version="0.29.0")
    ```

    Note that the 5 most recently downloaded versions are retained in the cache. Use the [cached_agent_binaries()](./reference/index.html.md#cached_agent_binaries) function to list the contents of the cache.

## Centaur Mode

The `gemini_cli()` agent can also be run in “centaur” mode which uses the Inspect AI [Human Agent](https://inspect.aisi.org.uk/human-agent.html) as the solver and makes [Gemini CLI](https://github.com/google-gemini/gemini-cli) available to the human user for help with the task. So rather than strictly measuring human vs. model performance, you are able to measure performance of humans working collaboratively with a model.

Enable centaur mode by passing `centaur=True` to the `gemini_cli()` agent:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import gemini_cli

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=gemini_cli(centaur=True),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also enable centaur mode from the CLI using a solver arg (`-S`):

``` bash
inspect eval ctf.py --solver inspect_swe/gemini_cli -S centaur=true
```

You can also pass `CentaurOptions` to further customize the behavior of the human agent. For example:

``` python
from inspect_swe import CentaurOptions

Task(
    dataset=json_dataset("dataset.json"),
    solver=gemini_cli(centaur=CentaurOptions(answer=False)),
    scorer=model_graded_qa(),
    sandbox="docker",
)
```

See the [human_cli()](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#human_cli) documentation for details on available options.

## Troubleshooting

If Gemini CLI doesn’t appear to be working or working as expected, you can troubleshoot by dumping the Gemini CLI debug log after an evaluation task is complete. You can do this with:

``` bash
inspect trace dump --filter "Gemini CLI"
```


# OpenCode – Inspect SWE

## Overview

The `opencode()` agent uses the unattended mode of Anomaly [OpenCode](https://github.com/anomalyco/opencode) to execute agentic tasks within the Inspect sandbox. Model API calls that occur in the sandbox are proxied back to Inspect for handling by the model provider for the current task.

> **NOTE: NoteOpenCode Installation**
>
> By default, the agent will download the current stable version of OpenCode and copy it to the sandbox. You can also exercise more explicit control over which version of OpenCode is used—see the [Installation](#installation) section below for details.

## Basic Usage

Use the `opencode()` agent as you would any Inspect agent. For example, here we use it as the solver in an Inspect task:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import opencode

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=opencode(),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also pass the agent as a `--solver` on the command line:

``` bash
inspect eval ctf.py --solver inspect_swe/opencode
```

If you want to try this out locally, see the [system_explorer](https://github.com/meridianlabs-ai/inspect_swe/tree/main/examples/system_explorer/task.py) example.

## Options

The following options are supported for customizing the behavior of the agent:

| Option | Description |
|----|----|
| `system_prompt` | Additional system prompt to append to default system prompt. |
| `skills` | Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent. |
| `mcp_servers` | MCP servers (see [MCP Servers](#mcp-servers) below for details). |
| `bridged_tools` | Host-side Inspect tools to expose via MCP (see [Bridged Tools](#bridged-tools) below for details). |
| `centaur` | Run in [Centaur Mode](#centaur-mode), which makes OpenCode available to an Inspect `human_cli()` agent rather than running it unattended. |
| `attempts` | Allow the agent to have multiple scored attempts at solving the task. |
| `model` | Model name to use for agent (defaults to main model for task). |
| `opencode_model` | OpenCode model identifier (`provider/model`) passed to the CLI. Default: `"anthropic/claude-sonnet-4-5"`. The actual model calls still go through the Inspect bridge; this just selects which provider client OpenCode uses to format the request. |
| `filter` | Filter for intercepting bridged model requests. |
| `retry_refusals` | Should refusals be retried? (pass number of times to retry) |
| `cwd` | Working directory for OpenCode session. |
| `env` | Environment variables to set for OpenCode. |
| `version` | Version of OpenCode to use (see [Installation](#installation) below for details) |

For example, here we specify a custom system prompt:

``` python
opencode(
    system_prompt="You are an ace system researcher."
)
```

## MCP Servers

You can specify one or more [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro) (MCP) servers to provide additional tools to OpenCode. Servers are specified via the [`MCPServerConfig`](https://inspect.aisi.org.uk/reference/inspect_ai.tool.html#mcpserverconfig) class and its Stdio and HTTP variants.

For example, here is a Dockerfile that makes the `server-memory` MCP server available in the sandbox container:

``` dockerfile
FROM python:3.12-bookworm

# nodejs (required by mcp server)
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
    && apt-get install -y --no-install-recommends nodejs \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# memory mcp server
RUN npx --yes @modelcontextprotocol/server-memory --version

# run forever
CMD ["tail", "-f", "/dev/null"]
```

Note that we run the `npx` server during the build of the Dockerfile so that it is cached for use offline (below we’ll run it with the `--offline` option).

We can then use this MCP server in a task as follows:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.tool import MCPServerConfigStdio
from inspect_swe import opencode

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(
                input="What transport protocols are supported in "
                + " the 2025-03-26 version of the MCP spec?"
            )
        ],
        solver=opencode(
            system_prompt="Please use the web search tool to "
            + "research this question and the memory tools "
            + "to keep track of your research.",
            mcp_servers=[
                MCPServerConfigStdio(
                    name="memory",
                    command="npx",
                    args=[
                        "--offline",
                        "@modelcontextprotocol/server-memory"
                    ],
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

Note that we run the MCP server using the `--offline` option so that it doesn’t require an internet connection (which it would normally use to check for updates to the package).

## Bridged Tools

You can expose host-side Inspect tools to the sandboxed agent via the MCP protocol using the `bridged_tools` parameter. This allows you to run tools on the host (e.g. tools that access host resources, databases, or APIs) but make them available to the agent running inside the sandbox.

Tools are specified via [`BridgedToolsSpec`](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#bridgedtoolsspec) which wraps a list of Inspect tools:

``` python
from inspect_ai import Task, task
from inspect_ai.agent import BridgedToolsSpec
from inspect_ai.dataset import Sample
from inspect_ai.tool import tool
from inspect_swe import opencode

@tool
def search_database():
    async def execute(query: str) -> str:
        """Search the internal database.

        Args:
            query: The search query.
        """
        # This runs on the host, not in the sandbox
        return f"Results for: {query}"
    return execute

@task
def investigator() -> Task:
    return Task(
        dataset=[
            Sample(input="Search for information about MCP protocols.")
        ],
        solver=opencode(
            system_prompt="Use the search tool to research.",
            bridged_tools=[
                BridgedToolsSpec(
                    name="host_tools",
                    tools=[search_database()]
                )
            ]
        ),
        sandbox=("docker", "Dockerfile"),
    )
```

The `name` field identifies the MCP server and will be visible to the agent as a tool prefix. You can specify multiple `BridgedToolsSpec` instances to create separate MCP servers for different tool groups.

See the [Bridged Tools](https://inspect.aisi.org.uk/agent-bridge.html#bridged-tools) documentation for more details on the architecture and how tool execution flows between host and sandbox.

## Installation

By default, the agent will download the current latest version of OpenCode and copy it to the sandbox. You can override this behaviour using the `version` option:

| Option | Description |
|----|----|
| `"auto"` | Use any available version of OpenCode in the sandbox, otherwise download the latest version. |
| `"sandbox"` | Use the version of OpenCode in the sandbox (raises `RuntimeError` if not available in the sandbox) |
| `"latest"` | Download and use the very latest version. |
| `"x.x.x"` | Download and use a specific version number. |

If you don’t ever want to rely on automatic downloads of OpenCode (e.g. if you run your evaluations offline), you can use one of two approaches:

1.  Pre-install the version of OpenCode you want to use in the sandbox, then use `version="sandbox"`:

    ``` python
    opencode(version="sandbox")
    ```

2.  Download the version of OpenCode you want to use into the cache, then specify that version explicitly:

    ``` python
    # download the agent binary during installation/configuration
    download_agent_binary("opencode", "0.29.0", "linux-x64")

    # reference that version in your task (no download will occur)
    opencode(version="0.29.0")
    ```

    Note that the 5 most recently downloaded versions are retained in the cache. Use the [cached_agent_binaries()](./reference/index.html.md#cached_agent_binaries) function to list the contents of the cache.

## Centaur Mode

The `opencode()` agent can also be run in “centaur” mode which uses the Inspect AI [Human Agent](https://inspect.aisi.org.uk/human-agent.html) as the solver and makes [OpenCode](https://github.com/anomalyco/opencode) available to the human user for help with the task. So rather than strictly measuring human vs. model performance, you are able to measure performance of humans working collaboratively with a model.

Enable centaur mode by passing `centaur=True` to the `opencode()` agent:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import opencode

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=opencode(centaur=True),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also enable centaur mode from the CLI using a solver arg (`-S`):

``` bash
inspect eval ctf.py --solver inspect_swe/opencode -S centaur=true
```

You can also pass `CentaurOptions` to further customize the behavior of the human agent. For example:

``` python
from inspect_swe import CentaurOptions

Task(
    dataset=json_dataset("dataset.json"),
    solver=opencode(centaur=CentaurOptions(answer=False)),
    scorer=model_graded_qa(),
    sandbox="docker",
)
```

See the [human_cli()](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#human_cli) documentation for details on available options.

## Troubleshooting

If OpenCode doesn’t appear to be working or working as expected, you can troubleshoot by dumping the OpenCode debug log after an evaluation task is complete. You can do this with:

``` bash
inspect trace dump --filter "OpenCode"
```


# Mini SWE Agent – Inspect SWE

## Overview

The `mini_swe_agent()` agent uses the unattended mode of SWE-agent [mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) to execute agentic tasks within the Inspect sandbox. Model API calls that occur in the sandbox are proxied back to Inspect for handling by the model provider for the current task.

> **NOTE: Notemini-swe-agent Installation**
>
> By default, the agent will download the current stable version of mini-swe-agent and copy it to the sandbox. You can also exercise more explicit control over which version of mini-swe-agent is used—see the [Installation](#installation) section below for details.

## Basic Usage

Use the `mini_swe_agent()` agent as you would any Inspect agent. For example, here we use it as the solver in an Inspect task:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import mini_swe_agent

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=mini_swe_agent(),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also pass the agent as a `--solver` on the command line:

``` bash
inspect eval ctf.py --solver inspect_swe/mini_swe_agent
```

If you want to try this out locally, see the [system_explorer](https://github.com/meridianlabs-ai/inspect_swe/tree/main/examples/system_explorer/task.py) example.

## Options

The following options are supported for customizing the behavior of the agent:

| Option | Description |
|----|----|
| `system_prompt` | Additional system prompt to append to default system prompt. |
| `centaur` | Run in [Centaur Mode](#centaur-mode), which makes mini-swe-agent available to an Inspect `human_cli()` agent rather than running it unattended. |
| `attempts` | Allow the agent to have multiple scored attempts at solving the task. |
| `model` | Model name to use for agent (defaults to main model for task). |
| `filter` | Filter for intercepting bridged model requests. |
| `retry_refusals` | Should refusals be retried? (pass number of times to retry) |
| `compaction` | Compaction strategy for managing context window overflow. |
| `cwd` | Working directory for mini-swe-agent session. |
| `env` | Environment variables to set for mini-swe-agent. |
| `user` | User to execute mini-swe-agent as in the sandbox. |
| `sandbox` | Sandbox environment name. |
| `version` | Version of mini-swe-agent to use (see [Installation](#installation) below for details) |

For example, here we specify a custom system prompt:

``` python
mini_swe_agent(
    system_prompt="You are an ace system researcher.",
)
```

## Installation

By default, the agent will install the current stable version of mini-swe-agent in the sandbox via Python wheels. You can override this behaviour using the `version` option:

| Option | Description |
|----|----|
| `"stable"` | Install and use the default pinned stable version. |
| `"sandbox"` | Use the version of mini-swe-agent in the sandbox (raises `RuntimeError` if not available in the sandbox) |
| `"latest"` | Install and use the latest version from PyPI. |
| `"x.x.x"` | Install and use a specific version number. |

Unlike the other agents which use standalone binaries, mini-swe-agent is installed via Python wheels using `uv`. If you don’t ever want to rely on automatic installation of mini-swe-agent (e.g. if you run your evaluations offline), you can use one of two approaches:

1.  Pre-install the version of mini-swe-agent you want to use in the sandbox, then use `version="sandbox"`:

    ``` python
    mini_swe_agent(version="sandbox")
    ```

2.  Pre-install mini-swe-agent in your sandbox Dockerfile:

    ``` dockerfile
    RUN pip install mini-swe-agent==2.2.3
    ```

    Then reference it with `version="sandbox"` in your task.

## Centaur Mode

The `mini_swe_agent()` agent can also be run in “centaur” mode which uses the Inspect AI [Human Agent](https://inspect.aisi.org.uk/human-agent.html) as the solver and makes [mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) available to the human user for help with the task. So rather than strictly measuring human vs. model performance, you are able to measure performance of humans working collaboratively with a model.

Enable centaur mode by passing `centaur=True` to the `mini_swe_agent()` agent:

``` python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

from inspect_swe import mini_swe_agent

@task
def system_explorer() -> Task:
    return Task(
        dataset=json_dataset("dataset.json"),
        solver=mini_swe_agent(centaur=True),
        scorer=model_graded_qa(),
        sandbox="docker",
    )
```

You can also enable centaur mode from the CLI using a solver arg (`-S`):

``` bash
inspect eval ctf.py --solver inspect_swe/mini_swe_agent -S centaur=true
```

You can also pass `CentaurOptions` to further customize the behavior of the human agent. For example:

``` python
from inspect_swe import CentaurOptions

Task(
    dataset=json_dataset("dataset.json"),
    solver=mini_swe_agent(centaur=CentaurOptions(answer=False)),
    scorer=model_graded_qa(),
    sandbox="docker",
)
```

See the [human_cli()](https://inspect.aisi.org.uk/reference/inspect_ai.agent.html#human_cli) documentation for details on available options.

## Troubleshooting

If mini-swe-agent doesn’t appear to be working or working as expected, you can troubleshoot by dumping the mini-swe-agent debug log after an evaluation task is complete. You can do this with:

``` bash
inspect trace dump --filter "mini-swe-agent"
```


# Reference – Inspect SWE

## Agents

### claude_code

Claude Code agent.

Agent that uses [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) running in a sandbox.

The agent can either use a version of Claude Code installed in the sandbox, or can download a version and install it in the sandbox (see docs on `version` option below for details).

Use `disallowed_tools` to control access to tools. See [Tools available to Claude](https://docs.anthropic.com/en/docs/claude-code/settings#tools-available-to-claude) for the list of built-in tools which can be disallowed.

Use the `attempts` option to enable additional submissions if the initial submission(s) are incorrect (by default, no additional attempts are permitted).

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_claude_code/claude_code.py#L51)

``` python
@agent
def claude_code(
    name: str = "Claude Code",
    description: str = dedent("""
       Autonomous coding agent capable of writing, testing, debugging,
       and iterating on code across multiple languages.
    """),
    system_prompt: str | None = None,
    skills: Sequence[str | Path | Skill] | None = None,
    mcp_servers: Sequence[MCPServerConfig] | None = None,
    bridged_tools: Sequence[BridgedToolsSpec] | None = None,
    disallowed_tools: list[str] | None = None,
    centaur: bool | CentaurOptions = False,
    attempts: int | AgentAttempts = 1,
    model: str | None = None,
    model_config: str | None = None,
    model_aliases: dict[str, str | Model] | None = None,
    opus_model: str | None = None,
    sonnet_model: str | None = None,
    haiku_model: str | None = None,
    subagent_model: str | None = None,
    filter: GenerateFilter | None = None,
    auto_mode: bool = False,
    retry_refusals: int | None = 3,
    retry_uncaught_errors: int | None = 3,
    cwd: str | None = None,
    env: dict[str, str] | None = None,
    user: str | None = None,
    sandbox: str | None = None,
    version: Literal["auto", "sandbox", "stable", "latest"] | str = "auto",
    debug: bool | None = None,
) -> Agent
```

`name` str  
Agent name (used in multi-agent systems with `as_tool()` and `handoff()`)

`description` str  
Agent description (used in multi-agent systems with `as_tool()` and `handoff()`)

`system_prompt` str \| None  
Additional system prompt to append to default system prompt.

`skills` Sequence\[str \| Path \| Skill\] \| None  
Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent.

`mcp_servers` Sequence\[MCPServerConfig\] \| None  
MCP servers to make available to the agent.

`bridged_tools` Sequence\[BridgedToolsSpec\] \| None  
Host-side Inspect tools to expose to the agent via MCP. Each BridgedToolsSpec creates an MCP server that makes the specified tools available to the agent running in the sandbox.

`disallowed_tools` list\[str\] \| None  
List of tool names to disallow entirely.

`centaur` bool \| CentaurOptions  
Run in ‘centaur’ mode, which makes Claude Code available to an Inspect `human_cli()` agent rather than running it unattended.

`attempts` int \| AgentAttempts  
Configure agent to make multiple attempts. When this is specified, the task will be scored when the agent stops calling tools. If the scoring is successful, execution will stop. Otherwise, the agent will be prompted to pick up where it left off for another attempt.

`model` str \| None  
Model name to use for Opus and Sonnet calls (defaults to main model for task).

`model_config` str \| None  
Model id used to select the identity Claude Code presents to itself (its “You are powered by the model …” system prompt) and any model-gated client behavior. Defaults to `None`, which derives it from the real served model so the presented identity matches what’s actually running. Purely the displayed identity — calls are still bridged to the served Inspect model regardless. (Claude Code renders the genuine name/cutoff for recognized Anthropic ids and shows other ids verbatim.)

`model_aliases` dict\[str, str \| Model\] \| None  
Optional mapping of model names to Model instances or model name strings. Allows using custom Model implementations (e.g., wrapped Agents) instead of standard models. When a model name in the mapping is referenced, the corresponding Model/string is used.

`opus_model` str \| None  
The model to use for `opus`, or for `opusplan` when Plan Mode is active. Defaults to `model`.

`sonnet_model` str \| None  
The model to use for `sonnet`, or for `opusplan` when Plan Mode is not active. Defaults to `model`.

`haiku_model` str \| None  
The model to use for haiku, or [background functionality](https://code.claude.com/docs/en/costs#background-token-usage). Defaults to `model`.

`subagent_model` str \| None  
The model to use for [subagents](https://code.claude.com/docs/en/sub-agents). Defaults to `model`.

`filter` GenerateFilter \| None  
Filter for intercepting bridged model requests.

`auto_mode` bool  
Use `auto` permission mode rather than `--dangerously-skip-permissions`. Note that this can result in rejected tool calls so only enable if your evaluation can tolerate this.

`retry_refusals` int \| None  
Should refusals be retried? Defaults to retrying up to 3 times.

`retry_uncaught_errors` int \| None  
Should uncaught errors (unexpected crashes of Claude Code) be retried. Defaults to retrying up to 3 times.

`cwd` str \| None  
Working directory to run claude code within.

`env` dict\[str, str\] \| None  
Environment variables to set for claude code.

`user` str \| None  
User to execute claude code with.

`sandbox` str \| None  
Optional sandbox environment name.

`version` Literal\['auto', 'sandbox', 'stable', 'latest'\] \| str  
Version of claude code to use. One of: - “auto”: Use any available version of claude code in the sandbox, otherwise download the current stable version. - “sandbox”: Use the version of claude code in the sandbox (raises `RuntimeError` if claude is not available in the sandbox) - “stable”: Download and use the current stable version of claude code. - “latest”: Download and use the very latest version of claude code. - “x.x.x”: Download and use a specific version of claude code.

`debug` bool \| None  
Add `--debug` cli flag and trace all debug output.

### codex_cli

Codex CLI.

Agent that uses OpenAI [Codex CLI](https://github.com/openai/codex) running in a sandbox.

Use the `attempts` option to enable additional submissions if the initial submission(s) are incorrect (by default, no additional attempts are permitted).

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_codex_cli/codex_cli.py#L64)

``` python
def codex_cli(
    name: str = ...,
    description: str = ...,
    system_prompt: str | None = ...,
    model_config: str | None = ...,
    skills: Sequence[str | Path | Skill] | None = ...,
    mcp_servers: Sequence[MCPServerConfig] | None = ...,
    bridged_tools: Sequence[BridgedToolsSpec] | None = ...,
    web_search: CodexWebSearch = ...,
    goals: bool = ...,
    centaur: bool | CentaurOptions = ...,
    attempts: int | AgentAttempts = ...,
    model: str | None = ...,
    model_aliases: dict[str, str | Model] | None = ...,
    filter: GenerateFilter | None = ...,
    retry_refusals: int | None = ...,
    home_dir: str | None = ...,
    cwd: str | None = ...,
    env: dict[str, str] | None = ...,
    user: str | None = ...,
    sandbox: str | None = ...,
    version: Literal['auto', 'sandbox', 'latest'] | str = ...,
    config_overrides: dict[str, str] | None = ...,
    debug: bool | None = ...,
    *,
    disallowed_tools: list[Literal['web_search']] | None = ...,
) -> Agent
```

`name` str  
Agent name (used in multi-agent systems with `as_tool()` and `handoff()`)

`description` str  
Agent description (used in multi-agent systems with `as_tool()` and `handoff()`)

`system_prompt` str \| None  
Additional system prompt to append to default system prompt.

`model_config` str \| None  
Codex model slug used to select the system prompt and tool set. Defaults to `None`, which derives the slug from the real model so Codex’s prompt/tooling aligns with what’s actually running. Pass an explicit slug to override.

`skills` Sequence\[str \| Path \| Skill\] \| None  
Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent.

`mcp_servers` Sequence\[MCPServerConfig\] \| None  
MCP servers to make available to the agent.

`bridged_tools` Sequence\[BridgedToolsSpec\] \| None  
Host-side Inspect tools to expose to the agent via MCP. Each BridgedToolsSpec creates an MCP server that makes the specified tools available to the agent running in the sandbox.

`web_search` CodexWebSearch  
Web search mode. Use “live” for live web search, “cached” for cached web search, or “disabled” to disable web search. Defaults to “live”.

`goals` bool  
Enable Codex goal tools (defaults to `True`).

`centaur` bool \| CentaurOptions  
Run in ‘centaur’ mode, which makes Codex CLI available to an Inspect `human_cli()` agent rather than running it unattended.

`attempts` int \| AgentAttempts  
Configure agent to make multiple attempts. When this is specified, the task will be scored when the agent stops calling tools. If the scoring is successful, execution will stop. Otherwise, the agent will be prompted to pick up where it left off for another attempt.

`model` str \| None  
Model name to use (defaults to main model for task).

`model_aliases` dict\[str, str \| Model\] \| None  
Optional mapping of model names to Model instances or model name strings. Allows using custom Model implementations (e.g., wrapped Agents) instead of standard models. When a model name in the mapping is referenced, the corresponding Model/string is used.

`filter` GenerateFilter \| None  
Filter for intercepting bridged model requests.

`retry_refusals` int \| None  
Should refusals be retried? (pass number of times to retry)

`home_dir` str \| None  
Home directory to use for codex cli. If set, AGENTS.md, skills, and the MCP configuration will be written here.

`cwd` str \| None  
Working directory to run codex cli within.

`env` dict\[str, str\] \| None  
Environment variables to set for codex cli

`user` str \| None  
User to execute codex cli with.

`sandbox` str \| None  
Optional sandbox environment name.

`version` Literal\['auto', 'sandbox', 'latest'\] \| str  
Version of codex cli to use. One of: - “auto”: Use any available version of codex cli in the sandbox, otherwise download the latest version. - “sandbox”: Use the version of codex cli in the sandbox (raises `RuntimeError` if codex is not available in the sandbox) - “latest”: Download and use the very latest version of codex cli. - “x.x.x”: Download and use a specific version of codex cli.

`config_overrides` dict\[str, str\] \| None  
Additional Codex CLI configuration overrides. Each key-value pair is passed as `-c key=value` to the CLI.

`debug` bool \| None  
Trace all debug output.

`disallowed_tools` list\[Literal\['web_search'\]\] \| None  

### gemini_cli

Gemini CLI agent.

Agent that uses Google [Gemini CLI](https://github.com/google-gemini/gemini-cli) running in a sandbox with Inspect model bridging.

Use the `attempts` option to enable additional submissions if the initial submission(s) are incorrect (by default, no additional attempts are permitted).

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_gemini_cli/gemini_cli.py#L33)

``` python
@agent
def gemini_cli(
    name: str = "Gemini CLI",
    description: str = dedent("""
       Autonomous coding agent capable of writing, testing, debugging,
       and iterating on code across multiple languages.
    """),
    system_prompt: str | None = None,
    skills: Sequence[str | Path | Skill] | None = None,
    mcp_servers: Sequence[MCPServerConfig] | None = None,
    bridged_tools: Sequence[BridgedToolsSpec] | None = None,
    centaur: bool | CentaurOptions = False,
    attempts: int | AgentAttempts = 1,
    model: str | None = None,
    model_aliases: dict[str, str | Model] | None = None,
    gemini_model: str = "gemini-2.5-pro",
    filter: GenerateFilter | None = None,
    retry_refusals: int | None = None,
    cwd: str | None = None,
    env: dict[str, str] | None = None,
    user: str | None = None,
    sandbox: str | None = None,
    version: Literal["auto", "sandbox", "stable", "latest"] | str = "auto",
    debug: bool | None = None,
) -> Agent
```

`name` str  
Agent name (used in multi-agent systems with `as_tool()` and `handoff()`)

`description` str  
Agent description

`system_prompt` str \| None  
Additional system prompt to append

`skills` Sequence\[str \| Path \| Skill\] \| None  
Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent.

`mcp_servers` Sequence\[MCPServerConfig\] \| None  
MCP servers to make available to the agent

`bridged_tools` Sequence\[BridgedToolsSpec\] \| None  
Host-side Inspect tools to expose to the agent via MCP

`centaur` bool \| CentaurOptions  
Run in ‘centaur’ mode, which makes Gemini CLI available to an Inspect `human_cli()` agent rather than running it unattended.

`attempts` int \| AgentAttempts  
Configure agent to make multiple attempts

`model` str \| None  
Model name to use for inspect bridge (defaults to main model for task)

`model_aliases` dict\[str, str \| Model\] \| None  
Optional mapping of model names to Model instances or model name strings. Allows using custom Model implementations (e.g., wrapped Agents) instead of standard models. When a model name in the mapping is referenced, the corresponding Model/string is used.

`gemini_model` str  
Gemini model name to pass to CLI. This bypasses the auto-router. Use “gemini-2.5-pro” (default) or “gemini-2.5-flash”. The actual model calls still go through the inspect bridge, but this disables the router.

`filter` GenerateFilter \| None  
Filter for intercepting bridged model requests

`retry_refusals` int \| None  
Should refusals be retried? (pass number of times to retry)

`cwd` str \| None  
Working directory to run gemini cli within

`env` dict\[str, str\] \| None  
Environment variables to set for gemini cli

`user` str \| None  
User to execute gemini cli with

`sandbox` str \| None  
Optional sandbox environment name

`version` Literal\['auto', 'sandbox', 'stable', 'latest'\] \| str  
Version of gemini cli to use. One of: - “auto”: Use any available version in sandbox, otherwise download latest - “sandbox”: Use sandbox version (raises RuntimeError if not available) - “stable”/“latest”: Download and use the latest version - “x.x.x”: Download and use a specific version

`debug` bool \| None  
Trace all debug output.

### opencode

OpenCode agent.

Agent that uses [OpenCode](https://github.com/anomalyco/opencode) running in a sandbox with Inspect model bridging.

Use the `attempts` option to enable additional submissions if the initial submission(s) are incorrect (by default, no additional attempts are permitted).

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_opencode/opencode.py#L32)

``` python
@agent
def opencode(
    name: str = "OpenCode",
    description: str = dedent("""
       Open-source autonomous coding agent for the terminal, capable
       of writing, testing, debugging, and iterating on code across
       multiple languages.
    """),
    system_prompt: str | None = None,
    skills: Sequence[str | Path | Skill] | None = None,
    mcp_servers: Sequence[MCPServerConfig] | None = None,
    bridged_tools: Sequence[BridgedToolsSpec] | None = None,
    centaur: bool | CentaurOptions = False,
    attempts: int | AgentAttempts = 1,
    model: str | None = None,
    model_aliases: dict[str, str | Model] | None = None,
    opencode_model: str = "anthropic/claude-sonnet-4-5",
    filter: GenerateFilter | None = None,
    retry_refusals: int | None = None,
    cwd: str | None = None,
    env: dict[str, str] | None = None,
    user: str | None = None,
    sandbox: str | None = None,
    version: Literal["auto", "sandbox", "stable", "latest"] | str = "auto",
    debug: bool | None = None,
) -> Agent
```

`name` str  
Agent name (used in multi-agent systems with `as_tool()` and `handoff()`)

`description` str  
Agent description

`system_prompt` str \| None  
Additional system prompt to append

`skills` Sequence\[str \| Path \| Skill\] \| None  
Additional [skills](https://inspect.aisi.org.uk/tools-standard.html#sec-skill) to make available to the agent.

`mcp_servers` Sequence\[MCPServerConfig\] \| None  
MCP servers to make available to the agent

`bridged_tools` Sequence\[BridgedToolsSpec\] \| None  
Host-side Inspect tools to expose to the agent via MCP

`centaur` bool \| CentaurOptions  
Run in ‘centaur’ mode, which makes OpenCode available to an Inspect `human_cli()` agent rather than running it unattended.

`attempts` int \| AgentAttempts  
Configure agent to make multiple attempts

`model` str \| None  
Model name to use for inspect bridge (defaults to main model for task)

`model_aliases` dict\[str, str \| Model\] \| None  
Optional mapping of model names to Model instances or model name strings. Allows using custom Model implementations (e.g., wrapped Agents) instead of standard models. When a model name in the mapping is referenced, the corresponding Model/string is used.

`opencode_model` str  
OpenCode model identifier to pass to the CLI in the form `provider/model` (default: `"anthropic/claude-sonnet-4-5"`). The actual model calls still go through the Inspect bridge; this just selects which provider client OpenCode uses to format the request.

`filter` GenerateFilter \| None  
Filter for intercepting bridged model requests

`retry_refusals` int \| None  
Should refusals be retried? (pass number of times to retry)

`cwd` str \| None  
Working directory to run opencode within

`env` dict\[str, str\] \| None  
Environment variables to set for opencode

`user` str \| None  
User to execute opencode with

`sandbox` str \| None  
Optional sandbox environment name

`version` Literal\['auto', 'sandbox', 'stable', 'latest'\] \| str  
Version of opencode to use. One of: - “auto”: Use any available version in sandbox, otherwise download latest - “sandbox”: Use sandbox version (raises RuntimeError if not available) - “stable”/“latest”: Download and use the latest version - “x.x.x”: Download and use a specific version

`debug` bool \| None  
Trace all debug output.

### mini_swe_agent

mini-swe-agent agent.

Agent that uses [mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) running in a sandbox. Mini-swe-agent is a minimal 100-line agent that solves GitHub issues using only bash commands.

The agent can either use a version of mini-swe-agent installed in the sandbox, or can download and install it via pip (see docs on `version` option below).

Use `attempts` to enable additional submissions if initial submission(s) are incorrect (by default, no additional attempts are permitted).

This agent does not handle compaction natively. Use `compaction` to specify a compaction strategy.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_mini_swe_agent/mini_swe_agent.py#L48)

``` python
@agent
def mini_swe_agent(
    name: str = "mini-swe-agent",
    description: str = dedent("""
       Minimal AI agent that solves software engineering tasks using bash commands.
       100 lines of Python, radically simple, scores >74% on SWE-bench verified.
    """),
    system_prompt: str | None = None,
    centaur: bool | CentaurOptions = False,
    attempts: int | AgentAttempts = 1,
    model: str | None = None,
    model_aliases: dict[str, str | Model] | None = None,
    filter: GenerateFilter | None = None,
    retry_refusals: int | None = None,
    compaction: CompactionStrategy | None = None,
    cwd: str | None = None,
    env: dict[str, str] | None = None,
    user: str | None = None,
    sandbox: str | None = None,
    version: Literal["stable", "sandbox", "latest"] | str = "stable",
    debug: bool | None = None,
) -> Agent
```

`name` str  
Agent name (used in multi-agent systems with `as_tool()` and `handoff()`)

`description` str  
Agent description (used in multi-agent systems)

`system_prompt` str \| None  
Additional system prompt to include (appended to any system messages from the task).

`centaur` bool \| CentaurOptions  
Run in ‘centaur’ mode, which makes mini-swe-agent available to an Inspect `human_cli()` agent rather than running it unattended.

`attempts` int \| AgentAttempts  
Configure agent to make multiple attempts.

`model` str \| None  
Model name to use (defaults to main model for task).

`model_aliases` dict\[str, str \| Model\] \| None  
Optional mapping of model names to Model instances or model name strings. Allows using custom Model implementations (e.g., wrapped Agents) instead of standard models. When a model name in the mapping is referenced, the corresponding Model/string is used.

`filter` GenerateFilter \| None  
Filter for intercepting bridged model requests.

`retry_refusals` int \| None  
Should refusals be retried? (pass number of times to retry)

`compaction` CompactionStrategy \| None  
Compaction strategy for managing context window overflow.

`cwd` str \| None  
Working directory to run mini-swe-agent within.

`env` dict\[str, str\] \| None  
Environment variables to set for mini-swe-agent.

`user` str \| None  
User to execute mini-swe-agent with.

`sandbox` str \| None  
Optional sandbox environment name.

`version` Literal\['stable', 'sandbox', 'latest'\] \| str  
Version of mini-swe-agent to use. One of: - “stable”: Download and install the default pinned version. - “sandbox”: Use version in sandbox (raises RuntimeError if not available) - “latest”: Download and install latest version from PyPI. - “x.x.x”: Install and use a specific version.

`debug` bool \| None  
Trace all debug output.

## Binaries

### download_agent_binary

Download agent binary.

Download an agent binary. This version will be added to the cache of downloaded versions (which retains the 5 most recently downloaded versions).

Use this if you need to ensure that a specific version of an agent binary is downloaded in advance (e.g. if you are going to run your evaluations offline). After downloading, explicit requests for the downloaded version (e.g. `claude_code(version="1.0.98")`) will not require network access.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_tools/download.py#L53)

``` python
def download_agent_binary(
    binary: Literal["claude_code", "codex_cli"],
    version: Literal["stable", "latest"] | str,
    platform: SandboxPlatform,
) -> None
```

`binary` Literal\['claude_code', 'codex_cli'\]  
Type of binary to download

`version` Literal\['stable', 'latest'\] \| str  
Version to download (“stable”, “latest”, or an explicit version number).

`platform` [SandboxPlatform](../reference/index.html.md#sandboxplatform)  
Target platform (“linux-x64”, “linux-arm64”, “linux-x64-musl”, or “linux-arm64-musl”)

### cached_agent_binaries

List the agent binaries which have been cached on this system.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_tools/download.py#L80)

``` python
def cached_agent_binaries(
    binary: Literal["claude_code", "codex_cli"] | None = None, quiet: bool = False
) -> AgentBinaries
```

`binary` Literal\['claude_code', 'codex_cli'\] \| None  
Type of binary to list (lists all of if not specified).

`quiet` bool  
Do not print the binaries as a side effect

### download_wheels_tarball

Download all wheels for a package and its dependencies.

Downloads wheels from PyPI for the specified platform and Python version, then bundles them into a tarball for offline installation in sandbox. Downloaded wheels are cached locally (retaining 5 most recent versions).

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_util/agentwheel.py#L304)

``` python
def download_wheels_tarball(
    package_name: str,
    version: str | None,
    platform: SandboxPlatform,
    python_version: str,
) -> tuple[bytes, str]
```

`package_name` str  
PyPI package name (e.g., “mini-swe-agent”)

`version` str \| None  
Package version or None for latest

`platform` [SandboxPlatform](../reference/index.html.md#sandboxplatform)  
Target sandbox platform

`python_version` str  
Python version without dots (e.g., “312”)

### AgentBinary

Agent binary.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_tools/download.py#L15)

``` python
class AgentBinary(NamedTuple)
```

#### Attributes

`agent` Literal\['claude_code', 'codex_cli'\]  
Agent type.

`version` str  
Agent version.

`path` Path  
“Agent path.

### SandboxPlatform

Target platform identifier for sandbox binary and wheel downloads.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/_util/sandbox.py#L5)

``` python
SandboxPlatform: TypeAlias = Literal[
    "linux-x64", "linux-arm64", "linux-x64-musl", "linux-arm64-musl"
]
```

## ACP

### interactive_claude_code

Claude Code agent via ACP.

Uses the `claude-agent-acp` adapter in a sandbox. Supports multi-turn sessions and mid-turn interrupts.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/_agents/claude_code/claude_code.py#L167)

``` python
def interactive_claude_code(
    *,
    disallowed_tools: list[str] | None = ...,
    skills: list[str | Path | Skill] | None = ...,
    opus_model: str | Model | None = ...,
    sonnet_model: str | Model | None = ...,
    haiku_model: str | Model | None = ...,
    subagent_model: str | Model | None = ...,
    model: str | Model | None = ...,
    filter: GenerateFilter | None = ...,
    bridged_tools: list[BridgedToolsSpec] | None = ...,
    mcp_servers: list[MCPServerConfig] | None = ...,
    system_prompt: str | None = ...,
    retry_refusals: int | None = ...,
    model_map: dict[str, str | Model] | None = ...,
    cwd: str | None = ...,
    env: dict[str, str] | None = ...,
    user: str | None = ...,
    sandbox: str | None = ...,
) -> ACPAgent
```

`disallowed_tools` list\[str\] \| None  
Tool names to disallow.

`skills` list\[str \| Path \| Skill\] \| None  
Additional skills to make available.

`opus_model` str \| Model \| None  
Model for opus calls.

`sonnet_model` str \| Model \| None  
Model for sonnet calls.

`haiku_model` str \| Model \| None  
Model for haiku / background calls.

`subagent_model` str \| Model \| None  
Model for subagents.

`model` str \| Model \| None  

`filter` GenerateFilter \| None  

`bridged_tools` list\[BridgedToolsSpec\] \| None  

`mcp_servers` list\[MCPServerConfig\] \| None  

`system_prompt` str \| None  

`retry_refusals` int \| None  

`model_map` dict\[str, str \| Model\] \| None  

`cwd` str \| None  

`env` dict\[str, str\] \| None  

`user` str \| None  

`sandbox` str \| None  

### interactive_codex_cli

Codex CLI agent via ACP.

Uses the `codex-acp` adapter in a sandbox. Supports multi-turn sessions and mid-turn interrupts.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/_agents/codex_cli/codex_cli.py#L197)

``` python
def interactive_codex_cli(
    *,
    web_search: CodexWebSearch = ...,
    goals: bool = ...,
    skills: list[str | Path | Skill] | None = ...,
    home_dir: str | None = ...,
    config_overrides: dict[str, str] | None = ...,
    disallowed_tools: list[Literal['web_search']] | None = ...,
) -> ACPAgent
```

`web_search` CodexWebSearch  
Web search mode. Use `"live"` for live web search, `"cached"` for cached web search, or `"disabled"` to disable web search.

`goals` bool  
Enable Codex goal tools.

`skills` list\[str \| Path \| Skill\] \| None  
Additional skills to make available.

`home_dir` str \| None  
Override for `CODEX_HOME` directory in the sandbox.

`config_overrides` dict\[str, str\] \| None  
Extra Codex config.toml key-value pairs.

`disallowed_tools` list\[Literal\['web_search'\]\] \| None  

### interactive_gemini_cli

Gemini CLI agent via ACP.

Uses gemini’s native `--experimental-acp` flag in a sandbox. Supports multi-turn sessions and mid-turn interrupts.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/_agents/gemini_cli/gemini_cli.py#L152)

``` python
def interactive_gemini_cli(
    *,
    skills: list[str | Path | Skill] | None = ...,
    version: Literal['auto', 'sandbox', 'stable', 'latest'] | str = ...,
    debug: bool = ...,
    model: str | Model | None = ...,
    filter: GenerateFilter | None = ...,
    bridged_tools: list[BridgedToolsSpec] | None = ...,
    mcp_servers: list[MCPServerConfig] | None = ...,
    system_prompt: str | None = ...,
    retry_refusals: int | None = ...,
    model_map: dict[str, str | Model] | None = ...,
    cwd: str | None = ...,
    env: dict[str, str] | None = ...,
    user: str | None = ...,
    sandbox: str | None = ...,
) -> ACPAgent
```

`skills` list\[str \| Path \| Skill\] \| None  
Additional skills to make available.

`version` Literal\['auto', 'sandbox', 'stable', 'latest'\] \| str  
Version of gemini CLI to use. One of: `"auto"`, `"sandbox"`, `"stable"`, `"latest"`, or a specific semver version string.

`debug` bool  
Run gemini-cli with `--debug` and `GEMINI_DEBUG_LOG_FILE` set to `$HOME/gemini-debug.log` in the sandbox (in ACP mode console output is patched away from stderr, so the log file is the only way to surface internals).

`model` str \| Model \| None  

`filter` GenerateFilter \| None  

`bridged_tools` list\[BridgedToolsSpec\] \| None  

`mcp_servers` list\[MCPServerConfig\] \| None  

`system_prompt` str \| None  

`retry_refusals` int \| None  

`model_map` dict\[str, str \| Model\] \| None  

`cwd` str \| None  

`env` dict\[str, str\] \| None  

`user` str \| None  

`sandbox` str \| None  

### bridge_mcp_to_acp

Convert bridge `MCPServerConfigHTTP` objects to ACP `HttpMcpServer`.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/agent.py#L29)

``` python
def bridge_mcp_to_acp(configs: list[MCPServerConfigHTTP]) -> list[HttpMcpServer]
```

`configs` list\[MCPServerConfigHTTP\]  

### ACPAgent

Base class for ACP-based agents running in sandboxes.

Manages the ACP lifecycle (connection, session, MCP announcement, cleanup). Subclasses implement :meth:`_start_agent` for agent-specific setup.

Sets up the ACP lifecycle, exposes `.conn` and `.session_id`, signals `.ready`, then blocks until the task is cancelled. The caller drives all prompts via `conn.prompt()` / `conn.cancel()`.

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/agent.py#L76)

``` python
class ACPAgent(Agent)
```

### ACPAgentParams

Keyword arguments accepted by :class:[ACPAgent](../reference/index.html.md#acpagent).

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/agent.py#L45)

``` python
class ACPAgentParams(TypedDict, total=False)
```

### acp_connection

Bridge an `ExecRemoteProcess` to ACP. Yield `(conn, feeder, error_info)`.

Bridges `ExecRemoteProcess` to the SDK’s `connect_to_agent()` via a transport wrapper, then cleans up on exit.

*feeder* is a background task that reads process stdout and feeds it into the ACP reader. It completes when the process exits, so callers can `await feeder` to detect unexpected process termination.

*proc_info* collects stderr output and the exit code as the process runs. Inspect after `await feeder` for full diagnostics.

Usage::

    async with acp_connection(proc) as (conn, feeder, proc_info):
        await conn.initialize(...)
        session = await conn.new_session(...)
        await conn.prompt(...)

[Source](https://github.com/meridianlabs-ai/inspect_swe/blob/49a7c3004a872b87f9c22fc53036b397b660716f/src/inspect_swe/acp/client.py#L255)

``` python
@contextlib.asynccontextmanager
async def acp_connection(
    proc: ExecRemoteProcess,
) -> AsyncIterator[tuple[ClientSideConnection, asyncio.Task[None], ErrorInfo]]
```

`proc` ExecRemoteProcess