Grep Scanner

Overview

The grep_scanner() provides pattern-based scanning of transcripts using grep-style matching. Unlike LLM Scanner which uses a language model for analysis, grep_scanner() performs fast, deterministic text pattern matching.

The grep scanner does simple string matching by default (literal patterns) and has an optional regex mode for complex patterns You can also search for multiple patterns with OR composition or alternatively labeled patterns for categorized results.

Basic Usage

Here is a simple example that finds occurrences of “error” in assistant messages:

from inspect_scout import Scanner, Transcript, grep_scanner, scanner

@scanner(messages=["assistant"])
def find_errors() -> Scanner[Transcript]:
    return grep_scanner("error")

The scanner returns a Result with:

  • value: Count of matches found (integer)
  • explanation: Context snippets showing each match
  • references: Message references for match locations

Pattern Types

Single Pattern

The simplest form takes a single string pattern:

grep_scanner("error")

Multiple Patterns

Pass a list to match any of multiple patterns (OR logic):

grep_scanner(["error", "failed", "exception"])

All matches across all patterns are aggregated into a single count.

Labeled Patterns

Pass a dict to get separate results for each category:

@scanner(messages="all")
def categorized_issues() -> Scanner[Transcript]:
    return grep_scanner({
        "errors": ["error", "failed", "exception"],
        "warnings": ["warning", "caution"],
        "security": ["password", "secret", "token"],
    })

This returns a list[Result], one per label:

# Returns:
# - Result(label="errors", value=3, ...)
# - Result(label="warnings", value=1, ...)
# - Result(label="security", value=0, ...)

Options

Case Sensitivity

By default, matching is case-insensitive (like grep -i):

grep_scanner("error")  # Matches "error", "ERROR", "Error"

For case-sensitive matching:

grep_scanner("error", ignore_case=False)  # Only matches "error"

Regex Mode

By default, patterns are treated as literal strings:

grep_scanner("file.*txt")  # Matches literal "file.*txt"

Enable regex mode for regular expression patterns:

grep_scanner(r"https?://\S+", regex=True)  # Matches URLs
grep_scanner(r"\d{3}-\d{4}", regex=True)   # Matches phone patterns

Word Boundary

Match whole words only:

grep_scanner("error", word_boundary=True)
# Matches "error" but not "errorCode" or "myerror"

This adds \b word boundary anchors around the pattern.

Scanner Results

Result Structure

For single/list patterns:

Field Type Description
value int Count of matches found
explanation str \| None Context snippets for each match
references list[Reference] Message/event citations

For labeled patterns (dict), each label produces its own Result with the same structure, plus:

Field Type Description
label str The category label

Context Snippets

The explanation field shows context around each match:

[M1]: ...request returned an **error** code 500...
[M2]: ...operation **error**: connection timeout...
[E1]: TOOL (search_files): Result: Found **error** in file.py...
  • Up to 50 characters before/after each match
  • Match text highlighted with **bold**
  • Ellipsis (...) when context is truncated
  • Reference prefix ([M1], [M2] for messages, [E1], [E2] for events)

Searching Events

To search events (tool calls, errors, etc.), use the events parameter in the @scanner decorator:

Events Only

@scanner(events=["tool", "error"])
def find_tool_errors() -> Scanner[Transcript]:
    return grep_scanner("timeout")

Both Messages and Events

@scanner(messages="all", events=["tool", "error"])
def find_all_errors() -> Scanner[Transcript]:
    return grep_scanner("error")

The scanner automatically searches whatever is populated in the transcript:

  • Messages are searched if transcript.messages is populated
  • Events are searched if transcript.events is populated

References use [M1], [M2] for messages and [E1], [E2] for events.

Supported Event Types

Event Type What’s Searched
model Model output completion text
tool Function name, arguments, and result
error Error message
info Data field (JSON stringified if object)
logger Log message text
approval Message, tool call, and decision

Examples

Finding Profanity

@scanner(messages=["assistant"])
def profanity_check() -> Scanner[Transcript]:
    return grep_scanner(
        ["damn", "hell", "crap"],
        word_boundary=True,  # Avoid partial matches
    )

Detecting URLs

@scanner(messages="all")
def url_detection() -> Scanner[Transcript]:
    return grep_scanner(
        r"https?://[^\s<>\"']+",
        regex=True,
    )

Multi-Category Analysis

@scanner(messages=["assistant"])
def content_analysis() -> Scanner[Transcript]:
    return grep_scanner({
        "code_snippets": [r"```", "def ", "class ", "function"],
        "questions": [r"\?$"],
        "commands": ["please", "could you", "can you"],
    }, regex=True)

Searching Tool Events

@scanner(events=["tool"])
def tool_failures() -> Scanner[Transcript]:
    return grep_scanner(
        ["timeout", "connection refused", "permission denied"],
        ignore_case=True,
    )

Messages and Events

@scanner(messages="all", events=["tool", "error"])
def comprehensive_error_check() -> Scanner[Transcript]:
    return grep_scanner({
        "errors": ["error", "exception", "failed"],
        "warnings": ["warning", "deprecated"],
    })