Results
Results
scan_list
List completed and pending scans.
def scan_list(scans_location: str) -> list[Status]scans_locationstr-
Location of scans to list.
scan_status
Status of scan.
def scan_status(scan_location: str) -> Statusscan_locationstr-
Location to get status for (e.g. directory or s3 bucket)
Status
Status of scan job.
@dataclass
class StatusAttributes
Summary
Summary of scan results.
class Summary(BaseModel)Attributes
completebool-
Is the scan complete?
scannersdict[str, ScannerSummary]-
Summary for each scanner.
scan_results_df
Scan results as Pandas data frames.
def scan_results_df(
scan_location: str,
*,
scanner: str | None = None,
rows: Literal["results", "transcripts"] = "results",
exclude_columns: list[str] | None = None,
) -> ScanResultsDFscan_locationstr-
Location of scan (e.g. directory or s3 bucket).
scannerstr | None-
Scanner name (defaults to all scanners).
rowsLiteral['results', 'transcripts']-
Row granularity. Specify “results” to yield a row for each scanner result (potentially multiple per transcript); Specify “transcript” to yield a row for each transcript (in which case multiple results will be packed into the
valuefield as a JSON list of Result). exclude_columnslist[str] | None-
List of column names to exclude when reading parquet files. Useful for reducing memory usage by skipping large unused columns.
ScanResultsDF
Scan results as pandas data frames.
The scanners mapping provides lazy access to DataFrames - each DataFrame is only materialized when its key is accessed. This allows efficient access to specific scanner results without loading all data upfront.
@dataclass
class ScanResultsDF(Status)Attributes
completebool-
Is the job complete (all transcripts scanned).
specScanSpec-
Scan spec (transcripts, scanners, options).
locationstr-
Location of scan directory.
summarySummary-
Summary of scan (results, errors, tokens, etc.)
errorslist[Error]-
Errors during last scan attempt.
scannersMapping[str, pd.DataFrame]-
Mapping of scanner name to pandas data frame (lazily loaded).
scan_results_arrow
Scan results as Arrow.
def scan_results_arrow(
scan_location: str,
) -> ScanResultsArrowscan_locationstr-
Location of scan (e.g. directory or s3 bucket).
ScanResultsArrow
Scan results as Arrow.
@dataclass
class ScanResultsArrow(Status)Attributes
completebool-
Is the job complete (all transcripts scanned).
specScanSpec-
Scan spec (transcripts, scanners, options).
locationstr-
Location of scan directory.
summarySummary-
Summary of scan (results, errors, tokens, etc.)
errorslist[Error]-
Errors during last scan attempt.
scannerslist[str]-
Scanner names.
Methods
- reader
-
Acquire a reader for the specified scanner.
The return reader is a context manager that should be acquired before reading.
@abc.abstractmethod def reader( self, scanner: str, streaming_batch_size: int = 1024, exclude_columns: list[str] | None = None, ) -> pa.RecordBatchReaderscannerstrstreaming_batch_sizeintexclude_columnslist[str] | None
Validation
validation_set
Create a validation set by reading cases from a file or data frame.
def validation_set(
cases: str | Path | pd.DataFrame,
predicate: ValidationPredicate | None = "eq",
split: str | list[str] | None = None,
) -> ValidationSetcasesstr | Path | pd.DataFrame-
Path to a CSV, YAML, JSON, or JSONL file with validation cases, or data frame with validation cases.
predicateValidationPredicate | None-
Predicate for comparing scanner results to validation targets (defaults to equality comparison). For single-value targets, compares value to target directly. For dict targets, string/single-value predicates are applied to each key, while multi-value predicates receive the full dicts.
splitstr | list[str] | None-
Optional split name(s) to filter cases by. Only cases with matching split values will be included. Can be a single split name or a list of split names. Cases without a split field are excluded when filtering.
ValidationSet
Validation set for a scanner.
class ValidationSet(BaseModel)Attributes
caseslist[ValidationCase]-
Cases to compare scanner values against.
predicateValidationPredicate | None-
Predicate for comparing scanner results to validation targets.
For single-value targets, the predicate compares value to target directly. For dict targets, string/single-value predicates are applied to each key, while multi-value predicates receive the full dicts.
splitstr | list[str] | None-
Active split filter applied to this validation set (informational).
ValidationCase
Validation case for comparing to scanner results.
A ValidationCase specifies the ground truth for a scan of particular id (e.g. transcript id, message id, etc.
Use target for single-value or dict validation. Use labels for validating resultsets with label-specific expectations.
class ValidationCase(BaseModel)Attributes
idstr | list[str]-
Target id (e.g. transcript_id, message, id, etc.)
targetJsonValue | None-
Target value that the scanner is expected to output.
For single-value results, this is the expected value. For dict-valued results, this is a dict of expected values.
labelsdict[str, bool] | None-
Label presence/absence expectations for resultset validation.
Maps label names to boolean expectations: - true: expect at least one result with a positive (non-negative) value - false: expect no results, or all results have negative values
predicatePredicateType | None-
Predicate for comparing scanner result to target (e.g., ‘eq’, ‘gte’, ‘contains’).
When set, this per-case predicate overrides the global predicate on ValidationSet.
splitstr | None-
Optional split name for organizing cases (e.g., ‘dev’, ‘test’, ‘train’).
Methods
- coerce_labels_to_bool
-
Coerce label values to boolean for backwards compatibility.
@field_validator("labels", mode="before") @classmethod def coerce_labels_to_bool(cls, v: Any) -> dict[str, bool] | NonevAny
PredicateType
String name of a built-in validation predicate.
PredicateType: TypeAlias = Literal[
"gt",
"gte",
"lt",
"lte",
"eq",
"ne",
"contains",
"startswith",
"endswith",
"icontains",
"iequals",
]PredicateFn
Function that implements a validation predicate.
PredicateFn: TypeAlias = Callable[
[Result, JsonValue], Awaitable[bool | dict[str, bool]]
]ValidationPredicate
Predicate used to compare scanner result with target value.
ValidationPredicate: TypeAlias = PredicateType | PredicateFn