Results
Results
scan_list
List completed and pending scans.
def scan_list(scans_location: str) -> list[Status]scans_locationstr-
Location of scans to list.
scan_status
Status of scan.
def scan_status(scan_location: str) -> Statusscan_locationstr-
Location to get status for (e.g. directory or s3 bucket)
Status
Status of scan job.
@dataclass
class StatusAttributes
Summary
Summary of scan results.
class Summary(BaseModel)Attributes
completebool-
Is the scan complete?
scannersdict[str, ScannerSummary]-
Summary for each scanner.
scan_results_df
Scan results as Pandas data frames.
def scan_results_df(
scan_location: str,
*,
scanner: str | None = None,
rows: Literal["results", "transcripts"] = "results",
) -> ScanResultsDFscan_locationstr-
Location of scan (e.g. directory or s3 bucket).
scannerstr | None-
Scanner name (defaults to all scanners).
rowsLiteral['results', 'transcripts']-
Row granularity. Specify “results” to yield a row for each scanner result (potentially multiple per transcript); Specify “transcript” to yield a row for each transcript (in which case multiple results will be packed into the
valuefield as a JSON list of Result).
ScanResultsDF
Scan results as pandas data frames.
The scanners mapping provides lazy access to DataFrames - each DataFrame is only materialized when its key is accessed. This allows efficient access to specific scanner results without loading all data upfront.
@dataclass
class ScanResultsDF(Status)Attributes
scannersMapping[str, pd.DataFrame]-
Mapping of scanner name to pandas data frame (lazily loaded).
scan_results_arrow
Scan results as Arrow.
def scan_results_arrow(
scan_location: str,
) -> ScanResultsArrowscan_locationstr-
Location of scan (e.g. directory or s3 bucket).
ScanResultsArrow
Scan results as Arrow.
@dataclass
class ScanResultsArrow(Status)Attributes
scannerslist[str]-
Scanner names.
Methods
- reader
-
Acquire a reader for the specified scanner.
The return reader is a context manager that should be acquired before reading.
@abc.abstractmethod def reader( self, scanner: str, streaming_batch_size: int = 1024, exclude_columns: list[str] | None = None, ) -> pa.RecordBatchReaderscannerstrstreaming_batch_sizeintexclude_columnslist[str] | None
Validation
validation_set
Create a validation set by reading cases from a file or data frame.
def validation_set(
cases: str | Path | pd.DataFrame,
predicate: ValidationPredicate | None = "eq",
) -> ValidationSetcasesstr | Path | pd.DataFrame-
Path to a CSV, YAML, JSON, or JSONL file with validation cases, or data frame with validation cases.
predicateValidationPredicate | None-
Predicate for comparing scanner results to validation targets (defaults to equality comparison). For single-value targets, compares value to target directly. For dict targets, string/single-value predicates are applied to each key, while multi-value predicates receive the full dicts.
ValidationSet
Validation set for a scanner.
class ValidationSet(BaseModel)Attributes
caseslist[ValidationCase]-
Cases to compare scanner values against.
predicateValidationPredicate | None-
Predicate for comparing scanner results to validation targets.
For single-value targets, the predicate compares value to target directly. For dict targets, string/single-value predicates are applied to each key, while multi-value predicates receive the full dicts.
ValidationCase
Validation case for comparing to scanner results.
A ValidationCase specifies the ground truth for a scan of particular id (e.g. transcript id, message id, etc.
Use target for single-value or dict validation. Use labels for validating resultsets with label-specific expectations.
class ValidationCase(BaseModel)Attributes
idstr | list[str]-
Target id (e.g. transcript_id, message, id, etc.)
targetJsonValue | None-
Target value that the scanner is expected to output.
For single-value results, this is the expected value. For dict-valued results, this is a dict of expected values.
labelsdict[str, JsonValue] | None-
Label-specific target values for resultset validation.
Maps result labels to their expected values. Used when validating scanners that return multiple labeled results per transcript.
Methods
- model_post_init
-
Validate that exactly one of target or labels is set.
def model_post_init(self, __context: Any) -> None__contextAny
ValidationPredicate
Predicate used to compare scanner result with target value.
ValidationPredicate: TypeAlias = (
Literal[
"gt",
"gte",
"lt",
"lte",
"eq",
"ne",
"contains",
"startswith",
"endswith",
"icontains",
"iequals",
]
| PredicateFn
)