Scanning
Scanning
scan
Scan transcripts.
Scan transcripts using one or more scanners. Note that scanners must each have a unique name. If you have more than one instance of a scanner with the same name, numbered prefixes will be automatically assigned. Alternatively, you can pass tuples of (name,scanner) or a dict with explicit names for each scanner.
def scan(
scanners: Sequence[Scanner[Any] | tuple[str, Scanner[Any]]]
| dict[str, Scanner[Any]]
| ScanJob
| ScanJobConfig,
transcripts: Transcripts | None = None,
results: str | None = None,
worklist: Sequence[ScannerWork] | str | Path | None = None,
validation: ValidationSet | dict[str, ValidationSet] | None = None,
model: str | Model | None = None,
model_config: GenerateConfig | None = None,
model_base_url: str | None = None,
model_args: dict[str, Any] | str | None = None,
model_roles: dict[str, str | Model] | None = None,
max_transcripts: int | None = None,
max_processes: int | None = None,
limit: int | None = None,
shuffle: bool | int | None = None,
tags: list[str] | None = None,
metadata: dict[str, Any] | None = None,
display: DisplayType | None = None,
log_level: str | None = None,
) -> StatusscannersSequence[Scanner[Any] | tuple[str, Scanner[Any]]] | dict[str, Scanner[Any]] | ScanJob | ScanJobConfig-
Scanners to execute (list, dict with explicit names, or ScanJob). If a ScanJob or ScanJobConfig is specified, then its options are used as the default options for the scan.
transcriptsTranscripts | None-
Transcripts to scan.
resultsstr | None-
Location to write results (filesystem or S3 bucket). Defaults to “./scans”.
worklistSequence[ScannerWork] | str | Path | None-
Transcript ids to process for each scanner (defaults to processing all transcripts). Either a list of ScannerWork or a YAML or JSON file contianing the same.
validationValidationSet | dict[str, ValidationSet] | None-
Validation cases to evaluate for scanners.
modelstr | Model | None-
Model to use for scanning by default (individual scanners can always call
get_model()to us arbitrary models). If not specified use the value of the SCOUT_SCAN_MODEL environment variable. model_configGenerateConfig | None-
GenerationConfigfor calls to the model. model_base_urlstr | None-
Base URL for communicating with the model API.
model_argsdict[str, Any] | str | None-
Model creation args (as a dictionary or as a path to a JSON or YAML config file).
model_rolesdict[str, str | Model] | None-
Named roles for use in
get_model(). max_transcriptsint | None-
The maximum number of transcripts to process concurrently (this also serves as the default value for
max_connections). Defaults to 25. max_processesint | None-
The maximum number of concurrent processes (for multiproccesing). Defaults to
multiprocessing.cpu_count(). limitint | None-
Limit the number of transcripts processed.
shufflebool | int | None-
Shuffle the order of transcripts (pass an
intto set a seed for shuffling). tagslist[str] | None-
One or more tags for this scan.
metadatadict[str, Any] | None-
Metadata for this scan.
displayDisplayType | None-
Display type: “rich”, “plain”, or “none” (defaults to “rich”).
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)
scan_resume
Resume a previous scan.
def scan_resume(
scan_location: str,
display: DisplayType | None = None,
log_level: str | None = None,
) -> Statusscan_locationstr-
Scan location to resume from.
displayDisplayType | None-
Display type: “rich”, “plain”, or “none” (defaults to “rich”).
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)
scan_complete
Complete a scan.
This function is used to indicate that a scan with errors in some transcripts should be completed in spite of the errors.
def scan_complete(
scan_location: str,
display: DisplayType | None = None,
log_level: str | None = None,
) -> Statusscan_locationstr-
Scan location to complete.
displayDisplayType | None-
Display type: “rich”, “plain”, or “none” (defaults to “rich”).
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”)
Jobs
scanjob
Decorator for registering scan jobs.
def scanjob(
func: ScanJobType | None = None, *, name: str | None = None
) -> ScanJobType | Callable[[ScanJobType], ScanJobType]funcScanJobType | None-
Function returning ScanJob.
namestr | None-
Optional name for scanjob (defaults to function name).
ScanJob
Scan job definition.
class ScanJobAttributes
namestr-
Name of scan job (defaults to @scanjob function name).
transcriptsTranscripts | None-
Trasnscripts to scan.
worklistSequence[ScannerWork] | None-
Transcript ids to process for each scanner (defaults to processing all transcripts).
validationdict[str, ValidationSet] | None-
Validation cases to apply.
scannersdict[str, Scanner[Any]]-
Scanners to apply to transcripts.
resultsstr | None-
Location to write results (filesystem or S3 bucket). Defaults to “./scans”.
modelModel | None-
Model to use for scanning by default (individual scanners can always call
get_model()to us arbitrary models).If not specified use the value of the SCOUT_SCAN_MODEL environment variable.
model_base_urlstr | None-
Base URL for communicating with the model API.
model_argsdict[str, Any] | None-
Model creation args (as a dictionary or as a path to a JSON or YAML config file).
generate_configGenerateConfig | None-
GenerationConfigfor calls to the model. model_rolesdict[str, Model] | None-
Named roles for use in
get_model(). max_transcriptsint | None-
The maximum number of transcripts to process concurrently (this also serves as the default value for
max_connections). Defaults to 25. max_processesint | None-
The maximum number of concurrent processes (for multiproccesing). Defaults to
multiprocessing.cpu_count(). limitint | None-
Limit the number of transcripts processed.
shufflebool | int | None-
Shuffle the order of transcripts (pass an
intto set a seed for shuffling). tagslist[str] | None-
One or more tags for this scan.
metadatadict[str, Any] | None-
Metadata for this scan.
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”).
ScanJobConfig
Scan job configuration.
class ScanJobConfig(BaseModel)Attributes
namestr-
Name of scan job (defaults to “job”).
transcriptsstr | list[str] | None-
Trasnscripts to scan.
scannerslist[ScannerSpec] | dict[str, ScannerSpec] | None-
Scanners to apply to transcripts.
worklistlist[ScannerWork] | None-
Transcript ids to process for each scanner (defaults to processing all transcripts).
validationdict[str, ValidationSet] | None-
Validation cases to apply for scanners.
resultsstr | None-
Location to write results (filesystem or S3 bucket). Defaults to “./scans”.
modelstr | None-
Model to use for scanning by default (individual scanners can always call
get_model()to us arbitrary models).If not specified use the value of the SCOUT_SCAN_MODEL environment variable.
model_base_urlstr | None-
Base URL for communicating with the model API.
model_argsdict[str, Any] | str | None-
Model creation args (as a dictionary or as a path to a JSON or YAML config file).
generate_configGenerateConfig | None-
GenerationConfigfor calls to the model. model_rolesdict[str, ModelConfig | str] | None-
Named roles for use in
get_model(). max_transcriptsint | None-
The maximum number of transcripts to process concurrently (this also serves as the default value for
max_connections). Defaults to 25. max_processesint | None-
The maximum number of concurrent processes (for multiproccesing). Defaults to
multiprocessing.cpu_count(). limitint | None-
Limit the number of transcripts processed.
shufflebool | int | None-
Shuffle the order of transcripts (pass an
intto set a seed for shuffling). tagslist[str] | None-
One or more tags for this scan.
metadatadict[str, Any] | None-
Metadata for this scan.
log_levelLiteral['debug', 'http', 'sandbox', 'info', 'warning', 'error', 'critical', 'notset'] | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”).
ScannerSpec
Scanner used by scan.
class ScannerSpec(BaseModel)Attributes
namestr-
Scanner name.
filestr | None-
Scanner source file (if not in a package).
paramsdict[str, Any]-
Scanner arguments.
ScannerWork
Definition of work to perform for a scanner.
By default scanners process all transcripts passed to scan(). You can alternately pass a list of ScannerWork to specify that only particular scanners and transcripts should be processed.
class ScannerWork(BaseModel)Attributes
scannerstr-
Scanner name.
transcriptslist[str]-
List of transcript ids.
Status
Status
Status of scan job.
@dataclass
class StatusAttributes
ScanOptions
Options used for scan.
class ScanOptions(BaseModel)Attributes
max_transcriptsint-
Maximum number of concurrent transcripts (defaults to 25).
max_processesint | None-
Number of worker processes. Defaults to
multiprocessing.cpu_count(). limitint | None-
Transcript limit (maximum number of transcripts to read).
shufflebool | int | None-
Shuffle order of transcripts.
ScanRevision
Git revision for scan.
class ScanRevision(BaseModel)Attributes
typeLiteral['git']-
Type of revision (currently only “git”)
originstr-
Revision origin server
commitstr-
Revision commit.
ScanTranscripts
Transcripts target by a scan.
class ScanTranscripts(BaseModel)Attributes
typestr-
Transcripts backing store type (currently only ‘eval_log’).
fieldslist[TranscriptField]-
Data types of transcripts fields.
countint-
Trancript count.
datastr-
Transcript data as a csv.
TranscriptField
Field in transcript data frame.
class TranscriptField(TypedDict, total=False)Attributes
nameRequired[str]-
Field name.
typeRequired[str]-
Field type (“integer”, “number”, “boolean”, “string”, or “datetime”)
tzNotRequired[str]-
Timezone (for “datetime” fields).