Async API

Note

The Async API is available for async programs that want to use inspect_scout as an embedded library.

Normal usage of Scout (e.g. in a script or notebook) should prefer the corresponding sync functions (e.g. scan(), scan_resume()., etc.). This will provide optimal parallelism (sharing transcript parses across scanners, using multiple processes, etc.) compared to multiple concurrent calls to scan_async() (as in that case you would lose the pooled transcript parsing and create unwanted resource contention).

scan_async

Scan transcripts.

Scan transcripts using one or more scanners. Note that scanners must each have a unique name. If you have more than one instance of a scanner with the same name, numbered prefixes will be automatically assigned. Alternatively, you can pass tuples of (name,scanner) or a dict with explicit names for each scanner.

Source

async def scan_async(
    scanners: Sequence[Scanner[Any] | tuple[str, Scanner[Any]]]
    | dict[str, Scanner[Any]]
    | ScanJob
    | ScanJobConfig,
    transcripts: Transcripts | None = None,
    results: str | None = None,
    worklist: Sequence[ScannerWork] | str | Path | None = None,
    validation: ValidationSet | dict[str, ValidationSet] | None = None,
    model: str | Model | None = None,
    model_config: GenerateConfig | None = None,
    model_base_url: str | None = None,
    model_args: dict[str, Any] | str | None = None,
    model_roles: dict[str, str | Model] | None = None,
    max_transcripts: int | None = None,
    max_processes: int | None = None,
    limit: int | None = None,
    shuffle: bool | int | None = None,
    tags: list[str] | None = None,
    metadata: dict[str, Any] | None = None,
    log_level: str | None = None,
) -> Status

scanners Sequence[Scanner[Any] | tuple[str, Scanner[Any]]] | dict[str, Scanner[Any]] | ScanJob | ScanJobConfig: Scanners to execute (list, dict with explicit names, or ScanJob). If a ScanJob or ScanJobConfig is specified, then its options are used as the default options for the scan.
transcripts Transcripts | None: Transcripts to scan.
results str | None: Location to write results (filesystem or S3 bucket). Defaults to “./scans”.
worklist Sequence[ScannerWork] | str | Path | None: Transcript ids to process for each scanner (defaults to processing all transcripts). Either a list of ScannerWork or a YAML or JSON file contianing the same.
validation ValidationSet | dict[str, ValidationSet] | None: Validation cases to apply for scanners.
model str | Model | None: Model to use for scanning by default (individual scanners can always call get_model() to us arbitrary models). If not specified use the value of the SCOUT_SCAN_MODEL environment variable.
model_config GenerateConfig | None: GenerationConfig for calls to the model.
model_base_url str | None: Base URL for communicating with the model API.
model_args dict[str, Any] | str | None: Model creation args (as a dictionary or as a path to a JSON or YAML config file).
model_roles dict[str, str | Model] | None: Named roles for use in get_model().
max_transcripts int | None: The maximum number of transcripts to process concurrently (this also serves as the default value for max_connections). Defaults to 25.
max_processes int | None: The maximum number of concurrent processes (for multiproccesing). Defaults to multiprocessing.cpu_count().
limit int | None: Limit the number of transcripts processed.
shuffle bool | int | None: Shuffle the order of transcripts (pass an int to set a seed for shuffling).
tags list[str] | None: One or more tags for this scan.
metadata dict[str, Any] | None: Metadata for this scan.
log_level str | None: Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)

scan_resume_async

Resume a previous scan.

Source

async def scan_resume_async(scan_location: str, log_level: str | None = None) -> Status

scan_location str: Scan location to resume from.
log_level str | None: Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)

scan_complete_async

Complete a scan.

This function is used to indicate that a scan with errors in some transcripts should be completed in spite of the errors.

Source

async def scan_complete_async(
    scan_location: str, log_level: str | None = None
) -> Status

scan_location str: Scan location to complete.
log_level str | None: Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)

scan_list_async

List completed and pending scans.

Source

async def scan_list_async(scans_location: str) -> list[Status]

scans_location str: Location of scans to list.

scan_status_async

Status of scan.

Source

async def scan_status_async(scan_location: str) -> Status

scan_location str: Location to get status for (e.g. directory or s3 bucket)

scan_results_df_async

Scan results as Pandas data frames.

Source

async def scan_results_df_async(
    scan_location: str,
    *,
    scanner: str | None = None,
    rows: Literal["results", "transcripts"] = "results",
) -> ScanResultsDF

scan_location str: Location of scan (e.g. directory or s3 bucket).
scanner str | None: Scanner name (defaults to all scanners).
rows Literal['results', 'transcripts']: Row granularity. Specify “results” to yield a row for each scanner result (potentially multiple per transcript); Specify “transcript” to yield a row for each transcript (in which case multiple results will be packed into the value field as a JSON list of Result).