Grep Scanner

Overview

The grep_scanner() provides pattern-based scanning of transcripts using grep-style matching. Unlike LLM Scanner which uses a language model for analysis, grep_scanner() performs fast, deterministic text pattern matching.

The grep scanner does simple string matching by default (literal patterns) and has an optional regex mode for complex patterns You can also search for multiple patterns with OR composition or alternatively labeled patterns for categorized results.

Basic Usage

Here is a simple example that finds occurrences of “error” in assistant messages:

from inspect_scout import Scanner, Transcript, grep_scanner, scanner

@scanner(messages=["assistant"])
def find_errors() -> Scanner[Transcript]:
    return grep_scanner("error")

The scanner returns a Result with:

value: Count of matches found (integer)
explanation: Context snippets showing each match
references: Message references for match locations

Pattern Types

Single Pattern

The simplest form takes a single string pattern:

grep_scanner("error")

Multiple Patterns

Pass a list to match any of multiple patterns (OR logic):

grep_scanner(["error", "failed", "exception"])

All matches across all patterns are aggregated into a single count.

Labeled Patterns

Pass a dict to get separate results for each category:

@scanner(messages="all")
def categorized_issues() -> Scanner[Transcript]:
    return grep_scanner({
        "errors": ["error", "failed", "exception"],
        "warnings": ["warning", "caution"],
        "security": ["password", "secret", "token"],
    })

This returns a list[Result], one per label:

# Returns:
# - Result(label="errors", value=3, ...)
# - Result(label="warnings", value=1, ...)
# - Result(label="security", value=0, ...)

Options

Case Sensitivity

By default, matching is case-insensitive (like grep -i):

grep_scanner("error")  # Matches "error", "ERROR", "Error"

For case-sensitive matching:

grep_scanner("error", ignore_case=False)  # Only matches "error"

Regex Mode

By default, patterns are treated as literal strings:

grep_scanner("file.*txt")  # Matches literal "file.*txt"

Enable regex mode for regular expression patterns:

grep_scanner(r"https?://\S+", regex=True)  # Matches URLs
grep_scanner(r"\d{3}-\d{4}", regex=True)   # Matches phone patterns

Word Boundary

Match whole words only:

grep_scanner("error", word_boundary=True)
# Matches "error" but not "errorCode" or "myerror"

This adds \b word boundary anchors around the pattern.

Scanner Results

Result Structure

For single/list patterns:

Field	Type	Description
`value`	`int`	Count of matches found
`explanation`	`str \\| None`	Context snippets for each match
`references`	`list[Reference]`	Message/event citations

For labeled patterns (dict), each label produces its own Result with the same structure, plus:

Field	Type	Description
`label`	`str`	The category label

Context Snippets

The explanation field shows context around each match:

[M1]: ...request returned an **error** code 500...
[M2]: ...operation **error**: connection timeout...
[E1]: TOOL (search_files): Result: Found **error** in file.py...

Up to 50 characters before/after each match
Match text highlighted with **bold**
Ellipsis (...) when context is truncated
Reference prefix ([M1], [M2] for messages, [E1], [E2] for events)

Searching Events

To search events (tool calls, errors, etc.), use the events parameter in the @scanner decorator:

Events Only

@scanner(events=["tool", "error"])
def find_tool_errors() -> Scanner[Transcript]:
    return grep_scanner("timeout")

Both Messages and Events

@scanner(messages="all", events=["tool", "error"])
def find_all_errors() -> Scanner[Transcript]:
    return grep_scanner("error")

The scanner automatically searches whatever is populated in the transcript:

Messages are searched if transcript.messages is populated
Events are searched if transcript.events is populated

References use [M1], [M2] for messages and [E1], [E2] for events.

Supported Event Types

Event Type	What’s Searched
`model`	Model output completion text
`tool`	Function name, arguments, and result
`error`	Error message
`info`	Data field (JSON stringified if object)
`logger`	Log message text
`approval`	Message, tool call, and decision

Examples

Finding Profanity

@scanner(messages=["assistant"])
def profanity_check() -> Scanner[Transcript]:
    return grep_scanner(
        ["damn", "hell", "crap"],
        word_boundary=True,  # Avoid partial matches
    )

Detecting URLs

@scanner(messages="all")
def url_detection() -> Scanner[Transcript]:
    return grep_scanner(
        r"https?://[^\s<>\"']+",
        regex=True,
    )

Multi-Category Analysis

@scanner(messages=["assistant"])
def content_analysis() -> Scanner[Transcript]:
    return grep_scanner({
        "code_snippets": [r"```", "def ", "class ", "function"],
        "questions": [r"\?$"],
        "commands": ["please", "could you", "can you"],
    }, regex=True)

Searching Tool Events

@scanner(events=["tool"])
def tool_failures() -> Scanner[Transcript]:
    return grep_scanner(
        ["timeout", "connection refused", "permission denied"],
        ignore_case=True,
    )

Messages and Events

@scanner(messages="all", events=["tool", "error"])
def comprehensive_error_check() -> Scanner[Transcript]:
    return grep_scanner({
        "errors": ["error", "exception", "failed"],
        "warnings": ["warning", "deprecated"],
    })