Custom Scanners

Overview

Scanners are the main unit of processing in Inspect Scout and can target a wide variety of content types. In this article we’ll cover the basic scanning concepts, and then drill into creating scanners that target various types (Transcript, ChatMessage, Event, or Timeline) as well as creating custom loaders which enable scanning of lists of events or messages.

This article goes in depth on custom scanner development. If you are looking for a straightforward high-level way to create an LLM-based scanner see the LLM Scanner documentation.

Note that you can also use scanners directly as Inspect scorers (see Scanners as Scorers for details).

Scanner Basics

A Scanner is a function that takes a ScannerInput (typically a Transcript, but possibly an Event, ChatMessage, Timeline, or list of events or messages) and returns a Result.

The result includes a value which can be of any type—this might be True to indicate that something was found but might equally be a number to indicate a count. More elaborate scanner values (dict or list) are also possible.

Here is a simple scanner that uses a model to look for agent “confusion”—whether or not it finds confusion, it still returns the model completion as an explanation:

from inspect_scout import (
    Result, Scanner, Transcript, scanner, message_numbering
)

from inspect_ai.model import get_model
import re

@scanner(messages="all")
def confusion() -> Scanner[Transcript]:

    async def scan(transcript: Transcript) -> Result:

        # setup message numbering
        messages_as_str, extract_refs = message_numbering()

        # call model
        output = await get_model().generate(
            "Here is a transcript of an LLM agent " +
            "solving a puzzle:\n\n" +
            "===================================" +
            await messages_as_str(transcript.messages) +
            "===================================\n\n" +
            "In the transcript above do you see the " +
            "agent becoming confused? Respond " +
            "beginning with 'Yes' or 'No', followed " +
            "by an explanation."
        )

        # extract the first word
        match = re.match(r"^\w+", output.completion.strip())

        # return result
        if match:
            answer = match.group(0)
            explanation = output.completion
            return Result(
                value=answer.lower() == "yes",
                answer=answer,
                explanation=explanation,
                references=extract_refs(explanation),
            )
        else:
            return Result(value=False, explanation=output.completion)

    return scan

This scanner illustrates some of the lower-level mechanics of building custom scanners. You can also use the higher level llm_scanner() to implement this in far fewer lines of code:

from inspect_scout import Transcript, llm_scanner, scanner

@scanner(messages="all")
def confusion() -> Scanner[Transcript]:
    return llm_scanner(
        question="In the transcript above do you see " +
            "the agent becoming confused?"
        answer="boolean"
    )

Input Types

Transcript is the most common ScannerInput however several other types are possible:

Event — Single event from the transcript (e.g. ModelEvent, ToolEvent, etc.).
ChatMessage — Single chat message from the transcript message history.
Timeline — Hierarchical span tree representing the structure of agent execution. See Timeline Scanners below for details.
list[Event] or list[ChatMessage] — Arbitrary sets of events or messages extracted from the Transcript (see Loaders below for details).

See the sections on Transcripts, Events, Messages, Timelines, and Loaders below for additional details on handling various input types.

Input Filtering

One important principle of the Inspect Scout transcript pipeline is that only the precise data to be scanned should be read, and nothing more. This can dramatically improve performance as messages and events that won’t be seen by scanners are never deserialized. Scanner input filters are specified as arguments to the @scanner decorator (you may have noticed the messages="all" attached to the scanner decorator in the example above).

For example, here we are looking for instances of assistants swearing—for this task we only need to look at assistant messages so we specify messages=["assistant"]

@scanner(messages=["assistant"])
def assistant_swearing() -> Scanner[Transcript]:

    async def scan(transcript: Transcript) -> Result:
        swear_words = [
            word
            for m in transcript.messages
            for word in extract_swear_words(m.text)
        ]
        return Result(
            value=len(swear_words),
            explanation=",".join(swear_words)
        )

    return scan

With this filter, only assistant messages (and no events at all) will be loaded from transcripts during scanning.

Note that by default, no filters are active, so if you don’t specify values for messages, events, and/or timeline your scanner will not be called!

The available filter parameters are:

messages — Filter for message types: "all" or a list like ["user", "assistant"].
events — Filter for event types: "all" or a list like ["model", "tool"].
timeline — Enable timeline loading: True for all event types, "all", or a list like ["model", "tool"].

You can also provide a version parameter to track scanner versions. When the version is incremented, previously scanned transcripts will be re-scanned with the updated scanner.

Transcripts

Transcripts are the most common input to scanners. If you are reading from Inspect eval logs, each log will have samples * epochs transcripts. If you are reading from another source, each agent trace will yield a single Transcript.

Transcript Fields

Here are the available Transcript fields:

Field	Type	Description
`transcript_id`	str	Globally unique identifier for a transcript (maps to `EvalSample.uuid` in Inspect logs).
`source_type`	str	Type of transcript source (e.g. “eval_log”, “weave”, etc.).
`source_id`	str	Globally unique identifier for a transcript source (maps to `eval_id` in Inspect logs)
`source_uri`	str	URI for source data (e.g. full path to the Inspect log file).
`date`	iso	Date/time when the transcript was created.
`task_set`	str	Set from which transcript task was drawn (e.g. Inspect task name or benchmark name)
`task_id`	str	Identifier for task (e.g. dataset sample id).
`task_repeat`	int	Repeat for a given task id within a task set (e.g. epoch).
`agent`	str	Agent used to to execute task.
`agent_args`	dict JSON	Arguments passed to create agent.
`model`	str	Main model used by agent.
`model_options`	dict JSON	Generation options for main model.
`score`	JsonValue JSON	Value indicating score on task.
`success`	bool	Boolean reduction of `score` to succeeded/failed.
`message_count`	int	Total messages in conversation.
`total_time`	number	Time required to execute task (seconds)
`total_tokens`	number	Tokens spent in execution of task.
`error`	str	Error message that terminated the task.
`limit`	str	Limit that caused the task to exit (e.g. “tokens”, “messages, etc.)
`metadata`	dict[str, JsonValue]	Transcript source specific metadata (e.g. model, task name, errors, epoch, dataset sample id, limits, etc.).
`messages`	list[ChatMessage]	Message history.
`events`	list[Event]	Event history (e.g. model events, tool events, etc.)
`timelines`	list[Timeline]	Optional list of custom timelines for this transcript.

Content Filtering

Note that the messages and events fields will not be populated unless you specify a messages or events filter on your scanner. For example, this scanner will see all messages and events:

@scanner(messages="all", events="all")
def my_scanner() -> Scanner[Transcript]: ...

This scanner will see only model and tool events:

@scanner(events=["model", "tool"])
def my_scanner() -> Scanner[Transcript]: ...

This scanner will see only assistant messages:

@scanner(messages=["assistant"])
def my_scanner() -> Scanner[Transcript]: ...

Presenting Messages

When processing transcripts, you will often want to present an entire message history to a model for analysis. The message_numbering() function provides numbered message formatting and reference extraction:

# setup message numbering
messages_as_str, extract_refs = message_numbering()

# call model
output = await get_model().generate(
    "Here is a transcript of an LLM agent " +
    "solving a puzzle:\n\n" +
    "===================================" +
    await messages_as_str(transcript.messages) +
    "===================================\n\n" +
    "In the transcript above do you see the agent " +
    "becoming confused? Respond beginning with 'Yes' " +
    "or 'No', followed by an explanation."
)

# extract references from the model's explanation
explanation = output.completion
references = extract_refs(explanation)

The message_numbering() function returns a (messages_as_str, extract_refs) pair:

messages_as_str() converts a list of messages into a numbered string representation, using auto-incrementing labels ([M1], [M2], etc.). If called multiple times within the same numbering scope, numbering continues where it left off (e.g. the second call starts at [M6] if the first call rendered five messages).
extract_refs() resolves citations like [M3] in model output back to message IDs, producing Reference objects suitable for Result.references.

You can optionally pass a MessagesPreprocessor to message_numbering() to control which messages are included. Available options include exclude_system, exclude_reasoning, and exclude_tool_usage.

To override the default role prefix (USER:, ASSISTANT:, etc.) in each rendered message, set ChatMessage.metadata["role_label"] to a non-empty string — the renderer will use it as the prefix verbatim. See Custom Role Labels for details.

Event Scanners

To write a scanner that targets events, write a function that takes the event type(s) you want to process. For example, this scanner will see only model events:

@scanner
def my_scanner() -> Scanner[ModelEvent]:
    def scan(event: ModelEvent) -> Result:
        ...

    return scan

Note that the events="model" filter was not required since we had already declared our scanner to take only model events. If we wanted to take both model and tool events we’d do this:

@scanner
def my_scanner() -> Scanner[ModelEvent | ToolEvent]:
    def scan(event: ModelEvent | ToolEvent) -> Result:
        ...

    return scan

Message Scanners

To write a scanner that targets messages, write a function that takes the message type(s) you want to process. For example, this scanner will only see tool messages:

@scanner
def my_scanner() -> Scanner[ChatMessageTool]:
    def scan(message: ChatMessageTool) -> Result:
        ...

    return scan

This scanner will see only user and assistant messages:

@scanner
def my_scanner() -> Scanner[ChatMessageUser | ChatMessageAssistant]:
    def scan(message: ChatMessageUser | ChatMessageAssistant) -> Result:
        ...

    return scan

Timeline Scanners

Timelines provide a hierarchical view of agent execution, organizing flat events into a tree of spans that represent agent invocations, tool calls, and other structured activities. While flat message and event lists work well for simple transcripts, timelines are essential for understanding multi-agent or deeply nested agent workflows.

To create a timeline scanner, use the timeline filter on the @scanner decorator or annotate your scanner with the Timeline type. You can also filter which event types are included in the timeline:

@scanner(timeline=True)
def my_scanner() -> Scanner[Timeline]: ...

@scanner(timeline=["model", "tool"])
def my_scanner() -> Scanner[Timeline]: ...

Timeline scanning differs from transcript scanning in that each span in the timeline tree is scanned independently, allowing you to analyze individual agent invocations and their relationships.

Note that timeline scanners are available only in the development version of Inspect Scout. See the Multi Agent article for a full description of the timeline data model and examples of timeline scanners.

Multiple Results

Scanners can return multiple results as a list. For example:

return [
    Result(label="deception", value=True, explanation="..."),
    Result(label="misconfiguration", value=True, explanation="...")
]

This is useful when a scanner is capable of making several types of observation. In this case it’s also important to indicate the origin of the result (i.e. which class of observation is is), which you can do using the label field (note that label can repeat multiple times in a set, so e.g. you could have multiple results with label="deception").

When a list is returned, each individual result will yield its own row in the results data frame.

When validating scanners that return lists of results, you can use result set validation to specify expected values for each label independently.

Custom Loaders

When you want to process multiple discrete items from a Transcript this might not always fall neatly into single messages or events. For example, you might want to process pairs of user/assistant messages. To do this, create a custom Loader that yields the content as required.

For example, here is a Loader that yields user/assistant message pairs:

@loader(messages=["user", "assistant"])
def conversation_turns():
    async def load(
        transcript: Transcript
    ) -> AsyncIterator[list[ChatMessage], None]:

        for user,assistant in message_pairs(transcript.messages):
            yield [user, assistant]

    return load

Note that just like with scanners, the loader still needs to provide a messages=["user", "assistant"] in order to see those messages.

We can now use this loader in a scanner that looks for refusals:

@scanner(loader=conversation_turns())
def assistant_refusals() -> Scanner[list[ChatMessage]]:

    async def scan(messages: list[ChatMessage]) -> Result:
        user, assistant = messages
        return Result(
            value=is_refusal(assistant.text),
            explanation=await messages_as_str(messages)
        )

    return scan

Packaging

A convenient way to distribute scanners is to include them in a Python package. This makes it very easy for others to use your scanner and ensure they have all of the required dependencies.

Scanners in packages can be registered such that users can easily refer to them by name from the CLI. For example, if your package is named myscanners and your scanner is named reward_hacking you could do a scan with:

scout scan myscanners/reward_hacking

Example

Here’s an example that walks through all of the requirements for registering scanners in packages. Let’s say your package is named myscanners and has a task named reward_hacking in the scanners.py file:

myscanners/
  myscanners/
    scanners.py
    _registry.py
  pyproject.toml

The _registry.py file serves as a place to import things that you want registered with Inspect. For example:

_registry.py

from .scanners import reward_hacking

You can then register reward_hacking (and anything else imported into _registry.py) as a setuptools entry point. This will ensure that inspect can resolve references to your package from the CLI. Here is how this looks in pyproject.toml:

[project.entry-points.inspect_ai]
myscanners = "myscanners._registry"

[project.entry-points.inspect_ai]
myscanners = "myscanners._registry"

[tool.poetry.plugins.inspect_ai]
myscanners = "myscanners._registry"

Now, anyone that has installed your package can run use your scanner as follows:

scout scan myscanners/reward_hacking

Scanners as Scorers

You may have noticed that scanners are very similar to Inspect Scorers. This is by design, and it is actually possible to use scanners directly as Inspect scorers.

For example, here is a simple scanner that checks for agent confusion:

@scanner(messages="all")
def confusion() -> Scanner[Transcript]:
    return llm_scanner(
        question="In the transcript above do you see " +
            "the agent becoming confused?",
        answer="boolean",
    )

We can use this directly in an Inspect Task as follows:

from .scanners import confusion

@task
def mytask():
    return Task(
        ...,
        scorer = confusion()
    )

We can also use it with the inspect score command:

inspect score --scorer scanners.py@confusion logfile.eval

Metrics

The metrics used for the scorer will default to mean() and stderr()—however, you can also explicitly specify metrics on the @scanner decorator:

@scanner(messages="all", metrics=[mean(), bootstrap_stderr()])
def confusion() -> Scanner[Transcript]: ...

If you are interfacing with code that expects only Scorer instances, you can also use the as_scorer() function from Inspect Scout to explicitly convert your scanner to a scorer:

from inspect_ai import eval
from inspect_scout import as_scorer

from .mytasks import ctf_task
from .scanners import confusion

eval(ctf_task(scorer=as_scorer(confusion())))

Result Sets

If your scanner yields multiple results you can still use it as a scorer, but you will want to provide a dictionary of metrics corresponding to the labels used by your results. For example, if you have a scanner that can yield results with label="deception" or label="misconfiguration", you might declare your metrics like this:

@scanner(messages="all", metrics=[{ "deception": [mean(), stderr()], "misconfiguration": [mean(), stderr()] }])
def my_scanner() -> Scanner[Transcript]: ...

Or you can use a glob (*) to use the same metrics for all labels:

@scanner(messages="all", metrics=[{ "*": [mean(), stderr()] }])
def my_scanner() -> Scanner[Transcript]: ...

You should also be sure to return a result for each supported label (so that metrics can be computed correctly on each row).

If your scanner yields multiple results see the discussion above on Result Sets for details on how to specify metrics for this case.

Scanner Metrics

You can add metrics to scanners to aggregate result values. Metrics are computed during scanning and available as part of the scan results. For example:

from inspect_ai.scorer import mean

@scanner(messages="all", metrics=[mean()])
def efficiency() -> Scanner[Transcript]:
    return llm_scanner(
        question="On a scale of 1 to 10, how efficiently did the assistant perform?",
        answer="numeric",
    )

Note that we import the mean metric from inspect_ai. You can use any standard Inspect metric or create custom metrics, and can optionally include more than one metric (e.g. stderr).

See the Inspect documentation on Built in Metrics and Custom Metrics for additional details.