Results

Overview

The results of scans are stored in directory on the local filesystem (by default ./scans) or in a remote S3 bucket. When a scan job is completed its directory is printed, and you can also use the scan_list() function or scout scan list command to enumerate scan jobs.

Scan results include the following:

  • Scan configuration (e.g. options passed to scan() or to scout scan).

  • Transcripts scanned and scanners executed and errors which occurred during the last scan.

  • A set of Parquet files with scan results (one for each scanner). Functions are available to interface with these files as Pandas data frames.

Workflow

Scout CLI

The scout scan command will print its status at the end of its run. If all of the scanners completed without errors you’ll see a message indicating the scan is complete along with a pointer to the scan directory where results are stored:

If you are running in VS Code, you can click the scan directory to view the results in Scout View. If you are using another editor, execute scout view from the terminal to launch the viewer:

scout view

To get programmatic access to the results, pass the scan directory to the scan_results_df() function:

from inspect_scout import scan_results_df

results = scan_results_df("scans/scan_id=3ibJe9cg7eM5zo3h5Hpbr8")
deception_df = results.scanners["deception"]
tool_errors_df = results.scanners["tool_errors"]

Python API

The scan() function returns a Status object which indicates whether the scan completed successfully (in which case the scanner results are available for analysis). You’ll therefore want to check the .completed field before proceeding to read the results. For example:

from inspect_scout import (
    scan, scan_results, transcripts_from
)

from .scanners import ctf_environment, java_tool_calls

status = scan(
    transcripts=transcripts_from("./logs"),
    scanners=[ctf_environment(), java_tool_calls()]
)

if status.complete:
    results = scan_results_df(status.location)
    deception_df = results.scanners["deception"]
    tool_errors_df = results.scanners["tool_errors"]

Results Data

The Results object returned from scan_results_df() includes both metadata about the scan as well as the scanner data frames:

Field Type Description
complete bool Is the job complete? (all transcripts scanned)
spec ScanSpec Scan specification (transcripts, scanners, options, etc.)
location str Location of scan directory
summary Summary Summary of scan (results, metrics, errors, tokens, etc.)
errors list[Error] Errors during last scan attempt.
scanners dict[str, pd.DataFrame] Results data for each scanner (see Data Frames for details)

Data Frames

The data frames available for each scanner contain information about the source evaluation and transcript, the results found for each transcript, as well as model calls, errors and other events which may have occurred during the scan.

Row Granularity

Note that by default the results data frame will include an individual row for each result returned by a scanner. This means that if a scanner returned multiple results there would be multiple rows all sharing the same transcript_id. You can customize this behavior via the rows option of the scan results functions:

rows = "results" Default. Yield a row for each scanner result (potentially multiple rows per transcript)
rows = "transcripts" Yield a row for each transcript (in which case multiple results will be packed into the value field as a JSON list of Result) and the value_type will be “resultset”.

Available Fields

The data frame includes the following fields (note that some fields included embedded JSON data, these are all noted below):

Field Type Description
transcript_id str Globally unique identifier for a transcript (e.g. sample uuid in the Inspect log).
transcript_source_type str Type of transcript source (e.g. “eval_log”).
transcript_source_id str Globally unique identifier for a transcript source (maps to eval_id in the Inspect log and analysis data frames).
transcript_source_uri str URI for source data (e.g. full path to the Inspect log file).
transcript_date str ISO 8601 datetime when the transcript was created.
transcript_task_set str Set from which transcript task was drawn (e.g. Inspect task name or benchmark name)
transcript_task_id str Identifier for task (e.g. dataset sample id).
transcript_task_repeat int Repeat for a given task id within a task set (e.g. epoch).
transcript_agent str Agent used to to execute task.
transcript_agent_args dict
JSON
Arguments passed to create agent.
transcript_model str Main model used by agent.
transcript_model_options JsonValue
JSON
Generation options for main model.
transcript_score JsonValue
JSON
Value indicating score on task.
transcript_success bool Boolean reduction of score to succeeded/failed.
transcript_total_time number Time required to execute task (seconds)
transcript_total_tokens number Tokens spent in execution of task.
transcript_error str Error message that terminated the task.
transcript_limit str Limit that caused the task to exit (e.g. “tokens”, “messages, etc.)
transcript_metadata dict
JSON
Source specific metadata.
scan_id str Globally unique identifier for scan.
scan_tags list[str]
JSON
Tags associated with the scan.
scan_metadata dict
JSON
Additional scan metadata.
scan_git_origin str Git origin for repo where scan was run from.
scan_git_version str Git version (based on tags) for repo where scan was run from.
scan_git_commit str Git commit for repo where scan was run from.
scanner_key str Unique key for scan within scan job (defaults to scanner_name).
scanner_name str Scanner name.
scanner_version int Scanner version.
scanner_package_version int Scanner package version.
scanner_file str Source file for scanner.
scanner_params dict
JSON
Params used to create scanner.
input_type transcript | message | messages | event | events Input type received by scanner.
input_ids list[str]
JSON
Unique ids of scanner input.
input ScannerInput
JSON
Scanner input value.
uuid str Globally unique id for scan result.
label str Label for the origin of the result (optional).
value JsonValue
JSON
Value returned by scanner.
value_type string | boolean | number | array | object | null Type of value returned by scanner.
answer str Answer extracted from scanner generation.
explanation str Explanation for scan result.
metadata dict
JSON
Metadata for scan result.
message_references list[Reference]
JSON
Messages referenced by scanner.
event_references list[Reference]
JSON
Events referenced by scanner.
validation_target JsonValue
JSON
Target value from validation set.
validation_result JsonValue
JSON
Result returned from comparing validation_target to value.
scan_error str Error which occurred during scan.
scan_error_traceback str Traceback for error (if any)
scan_error_type str Error type (either “refusal” for refusals or null for other errors).
scan_events list[Event]
JSON
Scan events (e.g. model event, log event, etc.)
scan_total_tokens number Total tokens used by scan (only included when rows = "transcripts").
scan_model_usage dict [str, ModelUsage]
JSON
Token usage by model for scan (only included when rows = "transcripts").