Results

Overview

The results of scans are stored in directory on the local filesystem (by default ./scans) or in a remote S3 bucket. When a scan job is completed its directory is printed, and you can also use the scan_list() function or scout scan list command to enumerate scan jobs.

Scan results include the following:

Scan configuration (e.g. options passed to scan() or to scout scan).
Transcripts scanned and scanners executed and errors which occurred during the last scan.
A set of Parquet files with scan results (one for each scanner). Functions are available to interface with these files as Pandas data frames.

Workflow

Scout CLI

The scout scan command will print its status at the end of its run. If all of the scanners completed without errors you’ll see a message indicating the scan is complete along with a pointer to the scan directory where results are stored:

You can then pass that directory to the scan_results_df() function to get access to the underlying data frames for each scanner:

from inspect_scout import scan_results

results = scan_results_df("scans/scan_id=3ibJe9cg7eM5zo3h5Hpbr8")
deception_df = results.scanners["deception"]
tool_errors_df = results.scanners["tool_errors"]

Python API

The scan() function returns a Status object which indicates whether the scan completed successfully (in which case the scanner results are available for analysis). You’ll therefore want to check the .completed field before proceeding to read the results. For example:

from inspect_scout import (
    scan, scan_results, transcripts_from_logs
)

from .scanners import ctf_environment, java_tool_calls

status = scan(
    transcripts=transcripts_from_logs("./logs"),
    scanners=[ctf_environment(), java_tool_calls()]
)

if status.complete:
    results = scan_results_df(status.location)
    deception_df = results.scanners["deception"]
    tool_errors_df = results.scanners["tool_errors"]

Results Data

The Results object returned from scan_results_df() includes both metadata about the scan as well as the scanner data frames:

Field	Type	Description
`complete`	bool	Is the job complete? (all transcripts scanned)
`spec`	ScanSpec	Scan specification (transcripts, scanners, options, etc.)
`location`	str	Location of scan directory
`summary`	Summary	Summary of scan (results, errors, tokens, etc.)
`errors`	list[Error]	Errors during last scan attempt.
`scanners`	dict[str, pd.DataFrame]	Results data for each scanner (see Data Frames for details)

Data Frames

The data frames available for each scanner contain information about the source evaluation and transcript, the results found for each transcript, as well as model calls, errors and other events which may have occurred during the scan.

Row Granularity

Note that by default the results data frame will include an individual row for each result returned by a scanner. This means that if a scanner returned multiple results there would be multiple rows all sharing the same transcript_id. You can customize this behavior via the rows option of the scan results functions:

`rows = "results"`	Default. Yield a row for each scanner result (potentially multiple rows per transcript)
`rows = "transcripts"`	Yield a row for each transcript (in which case multiple results will be packed into the `value` field as a JSON list of Result) and the `value_type` will be “resultset”.

Available Fields

The data frame includes the following fields (note that some fields included embedded JSON data, these are all noted below):

Field	Type	Description
`transcript_id`	str	Globally unique identifier for a transcript (maps to `EvalSample.uuid` in the Inspect log or `sample_id` in Inspect analysis data frames).
`transcript_source_id`	str	Globally unique identifier for a transcript source (maps to `eval_id` in the Inspect log and analysis data frames).
`transcript_source_uri`	str	URI for source data (e.g. full path to the Inspect log file).
`transcript_metadata`	dict JSON	Eval configuration metadata (e.g. task, model, scores, etc.).
`scan_id`	str	Globally unique identifier for scan.
`scan_tags`	list[str] JSON	Tags associated with the scan.
`scan_metadata`	dict JSON	Additional scan metadata.
`scanner_key`	str	Unique key for scan within scan job (defaults to `scanner_name`).
`scanner_name`	str	Scanner name.
`scanner_file`	str	Source file for scanner.
`scanner_params`	dict JSON	Params used to create scanner.
`input_type`	transcript \| message \| messages \| event \| events	Input type received by scanner.
`input_ids`	list[str] JSON	Unique ids of scanner input.
`input`	ScannerInput JSON	Scanner input value.
`uuid`	str	Globally unique id for scan result.
`label`	str	Label for the origin of the result (optional).
`value`	JsonValue JSON	Value returned by scanner.
`value_type`	string \| boolean \| number \| array \| object \| null	Type of value returned by scanner.
`answer`	str	Answer extracted from scanner generation.
`explanation`	str	Explanation for scan result.
`metadata`	dict JSON	Metadata for scan result.
`message_references`	list[Reference] JSON	Messages referenced by scanner.
`event_references`	list[Reference] JSON	Events referenced by scanner.
`validation_target`	JsonValue JSON	Target value from validation set.
`validation_result`	JsonValue JSON	Result returned from comparing `validation_target` to `value`.
`scan_error`	str	Error which occurred during scan.
`scan_error_traceback`	str	Traceback for error (if any)
`scan_events`	list[Event] JSON	Scan events (e.g. model event, log event, etc.)
`scan_total_tokens`	number	Total tokens used by scan (only included when `rows = "transcripts"`).
`scan_model_usage`	dict [str, ModelUsage] JSON	Token usage by model for scan (only included when `rows = "transcripts"`).

Several of these fields can be used to link back to the source eval log and sample for the transcript:

transcript_id — This is the same as the EvalSample.uuid in the Inspect log or the sample_id in data frames created by samples_df().
transcript_source_id — This is the same as the eval_id in both the Inspect log and Inspect data frames.
transcript_source_uri — This is the full path (filesystem or S3) to the actual log file where the transcript was read from.