Async API
The Async API is available for async programs that want to use inspect_scout as an embedded library.
Normal usage of Scout (e.g. in a script or notebook) should prefer the corresponding sync functions (e.g. scan(), scan_resume()., etc.). This will provide optimal parallelism (sharing transcript parses across scanners, using multiple processes, etc.) compared to multiple concurrent calls to scan_async() (as in that case you would lose the pooled transcript parsing and create unwanted resource contention).
scan_async
Scan transcripts.
Scan transcripts using one or more scanners. Note that scanners must each have a unique name. If you have more than one instance of a scanner with the same name, numbered prefixes will be automatically assigned. Alternatively, you can pass tuples of (name,scanner) or a dict with explicit names for each scanner.
async def scan_async(
scanners: Sequence[Scanner[Any] | tuple[str, Scanner[Any]]]
| dict[str, Scanner[Any]]
| ScanJob
| ScanJobConfig,
transcripts: Transcripts | None = None,
results: str | None = None,
worklist: Sequence[ScannerWork] | str | Path | None = None,
validation: ValidationSet | dict[str, ValidationSet] | None = None,
model: str | Model | None = None,
model_config: GenerateConfig | None = None,
model_base_url: str | None = None,
model_args: dict[str, Any] | str | None = None,
model_roles: dict[str, str | Model] | None = None,
max_transcripts: int | None = None,
max_processes: int | None = None,
limit: int | None = None,
shuffle: bool | int | None = None,
tags: list[str] | None = None,
metadata: dict[str, Any] | None = None,
log_level: str | None = None,
) -> StatusscannersSequence[Scanner[Any] | tuple[str, Scanner[Any]]] | dict[str, Scanner[Any]] | ScanJob | ScanJobConfig-
Scanners to execute (list, dict with explicit names, or ScanJob). If a ScanJob or ScanJobConfig is specified, then its options are used as the default options for the scan.
transcriptsTranscripts | None-
Transcripts to scan.
resultsstr | None-
Location to write results (filesystem or S3 bucket). Defaults to “./scans”.
worklistSequence[ScannerWork] | str | Path | None-
Transcript ids to process for each scanner (defaults to processing all transcripts). Either a list of ScannerWork or a YAML or JSON file contianing the same.
validationValidationSet | dict[str, ValidationSet] | None-
Validation cases to apply for scanners.
modelstr | Model | None-
Model to use for scanning by default (individual scanners can always call
get_model()to us arbitrary models). If not specified use the value of the SCOUT_SCAN_MODEL environment variable. model_configGenerateConfig | None-
GenerationConfigfor calls to the model. model_base_urlstr | None-
Base URL for communicating with the model API.
model_argsdict[str, Any] | str | None-
Model creation args (as a dictionary or as a path to a JSON or YAML config file).
model_rolesdict[str, str | Model] | None-
Named roles for use in
get_model(). max_transcriptsint | None-
The maximum number of transcripts to process concurrently (this also serves as the default value for
max_connections). Defaults to 25. max_processesint | None-
The maximum number of concurrent processes (for multiproccesing). Defaults to
multiprocessing.cpu_count(). limitint | None-
Limit the number of transcripts processed.
shufflebool | int | None-
Shuffle the order of transcripts (pass an
intto set a seed for shuffling). tagslist[str] | None-
One or more tags for this scan.
metadatadict[str, Any] | None-
Metadata for this scan.
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)
scan_resume_async
Resume a previous scan.
async def scan_resume_async(scan_location: str, log_level: str | None = None) -> Statusscan_locationstr-
Scan location to resume from.
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)
scan_complete_async
Complete a scan.
This function is used to indicate that a scan with errors in some transcripts should be completed in spite of the errors.
async def scan_complete_async(
scan_location: str, log_level: str | None = None
) -> Statusscan_locationstr-
Scan location to complete.
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)
scan_list_async
List completed and pending scans.
async def scan_list_async(scans_location: str) -> list[Status]scans_locationstr-
Location of scans to list.
scan_status_async
Status of scan.
async def scan_status_async(scan_location: str) -> Statusscan_locationstr-
Location to get status for (e.g. directory or s3 bucket)
scan_results_df_async
Scan results as Pandas data frames.
async def scan_results_df_async(
scan_location: str,
*,
scanner: str | None = None,
rows: Literal["results", "transcripts"] = "results",
) -> ScanResultsDFscan_locationstr-
Location of scan (e.g. directory or s3 bucket).
scannerstr | None-
Scanner name (defaults to all scanners).
rowsLiteral['results', 'transcripts']-
Row granularity. Specify “results” to yield a row for each scanner result (potentially multiple per transcript); Specify “transcript” to yield a row for each transcript (in which case multiple results will be packed into the
valuefield as a JSON list of Result).