Transcript API

Reading

transcripts_from

Read transcripts for scanning.

Transcripts may be stored in a TranscriptDB or may be Inspect eval logs.

def transcripts_from(location: str | Logs) -> Transcripts
location str | Logs

Transcripts location. Either a path to a transcript database or path(s) to Inspect eval logs.

Transcript

Transcript info and transcript content (messages and events).

class Transcript(TranscriptInfo)

Attributes

messages list[ChatMessage]

Main message thread.

events list[Event]

Events from transcript.

Transcripts

Collection of transcripts for scanning.

Transcript collections can be filtered using the where(), limit(), and ’shuffle()` methods. The transcripts are not modified in place so the filtered transcripts should be referenced via the return value. For example:

from inspect_scout import transcripts, columns as c

transcripts = transcripts_from("./logs")
transcripts = transcripts.where(c.task_set == "cybench")
class Transcripts(abc.ABC)

Methods

where

Filter the transcript collection by a Condition.

def where(self, condition: Condition) -> "Transcripts"
condition Condition

Filter condition.

for_validation

Filter transcripts to only those with IDs matching validation cases.

def for_validation(
    self, validation: ValidationSet | dict[str, ValidationSet]
) -> "Transcripts"
validation ValidationSet | dict[str, ValidationSet]

Validation object containing cases with target IDs.

limit

Limit the number of transcripts processed.

def limit(self, n: int) -> "Transcripts"
n int

Limit on transcripts.

shuffle

Shuffle the order of transcripts.

def shuffle(self, seed: int | None = None) -> "Transcripts"
seed int | None

Random seed for shuffling.

reader

Read the selected transcripts.

@abc.abstractmethod
def reader(self, snapshot: ScanTranscripts | None = None) -> TranscriptsReader
snapshot ScanTranscripts | None

An optional snapshot which provides hints to make the reader more efficient (e.g. by preventing a full scan to find transcript_id => filename mappings)

from_snapshot

Restore transcripts from a snapshot.

@staticmethod
@abc.abstractmethod
def from_snapshot(snapshot: ScanTranscripts) -> "Transcripts"
snapshot ScanTranscripts

TranscriptsReader

Read transcripts based on a TranscriptsQuery.

class TranscriptsReader(abc.ABC)

Methods

index

Index of TranscriptInfo for the collection.

@abc.abstractmethod
def index(self) -> AsyncIterator[TranscriptInfo]
read

Read transcript content.

@abc.abstractmethod
async def read(
    self, transcript: TranscriptInfo, content: TranscriptContent
) -> Transcript
transcript TranscriptInfo

Transcript to read.

content TranscriptContent

Content to read (e.g. specific message types, etc.)

Database

transcripts_db

Read/write interface to transcripts database.

def transcripts_db(location: str) -> TranscriptsDB
location str

Database location (e.g. directory or S3 bucket).

TranscriptsDB

Database of transcripts.

class TranscriptsDB(abc.ABC)

Methods

__init__

Create a transcripts database.

def __init__(self, location: str, where: list[Condition] | None = None) -> None
location str

Database location (e.g. local or S3 file path)

where list[Condition] | None

Optional list of conditions used to filter transcripts.

connect

Connect to transcripts database.

@abc.abstractmethod
async def connect(self) -> None
disconnect

Disconnect to transcripts database.

@abc.abstractmethod
async def disconnect(self) -> None
insert

Insert transcripts into database.

@abc.abstractmethod
async def insert(
    self,
    transcripts: Iterable[Transcript]
    | AsyncIterable[Transcript]
    | Transcripts
    | TranscriptsSource
    | pa.RecordBatchReader,
) -> None
transcripts Iterable[Transcript] | AsyncIterable[Transcript] | Transcripts | TranscriptsSource | pa.RecordBatchReader

Transcripts to insert (iterable, async iterable, or source).

transcript_ids

Get transcript IDs matching conditions.

Optimized method that returns only transcript IDs without loading full metadata. Default implementation uses select(), but subclasses can override for better performance.

@abc.abstractmethod
async def transcript_ids(
    self,
    where: list[Condition] | None = None,
    limit: int | None = None,
    shuffle: bool | int = False,
) -> dict[str, str | None]
where list[Condition] | None

Condition(s) to filter by.

limit int | None

Maximum number to return.

shuffle bool | int

Randomly shuffle results (pass int for reproducible seed).

select

Select transcripts matching a condition.

@abc.abstractmethod
def select(
    self,
    where: list[Condition] | None = None,
    limit: int | None = None,
    shuffle: bool | int = False,
) -> AsyncIterator[TranscriptInfo]
where list[Condition] | None

Condition(s) to select for.

limit int | None

Maximum number to select.

shuffle bool | int

Randomly shuffle transcripts selected (pass int for reproducible seed).

read

Read transcript content.

@abc.abstractmethod
async def read(self, t: TranscriptInfo, content: TranscriptContent) -> Transcript
t TranscriptInfo

Transcript to read.

content TranscriptContent

Content to read (messages, events, etc.)

TranscriptsSource

Async iterator of transcripts.

class TranscriptsSource(Protocol):
    def __call__(self) -> AsyncIterator[Transcript]

Filtering

Column

Database column with comparison operators.

Supports various predicate functions including like(), not_like(), between(), etc. Additionally supports standard python equality and comparison operators (e.g. ==, ’>`, etc.

class Column

Methods

in_

Check if value is in a list.

def in_(self, values: list[Any]) -> Condition
values list[Any]
not_in

Check if value is not in a list.

def not_in(self, values: list[Any]) -> Condition
values list[Any]
like

SQL LIKE pattern matching (case-sensitive).

def like(self, pattern: str) -> Condition
pattern str
not_like

SQL NOT LIKE pattern matching (case-sensitive).

def not_like(self, pattern: str) -> Condition
pattern str
ilike

PostgreSQL ILIKE pattern matching (case-insensitive).

Note: For SQLite and DuckDB, this will use LIKE with LOWER() for case-insensitivity.

def ilike(self, pattern: str) -> Condition
pattern str
not_ilike

PostgreSQL NOT ILIKE pattern matching (case-insensitive).

Note: For SQLite and DuckDB, this will use NOT LIKE with LOWER() for case-insensitivity.

def not_ilike(self, pattern: str) -> Condition
pattern str
is_null

Check if value is NULL.

def is_null(self) -> Condition
is_not_null

Check if value is not NULL.

def is_not_null(self) -> Condition
between

Check if value is between two values.

def between(self, low: Any, high: Any) -> Condition
low Any

Lower bound (inclusive). If None, raises ValueError.

high Any

Upper bound (inclusive). If None, raises ValueError.

not_between

Check if value is not between two values.

def not_between(self, low: Any, high: Any) -> Condition
low Any

Lower bound (inclusive). If None, raises ValueError.

high Any

Upper bound (inclusive). If None, raises ValueError.

Condition

WHERE clause condition that can be combined with others.

class Condition(BaseModel)

Attributes

left Union[str, 'Condition', None]

Column name (simple) or left operand (compound).

operator Union[Operator, LogicalOperator, None]

Comparison operator (simple) or logical operator (compound).

right Union['Condition', list[ScalarValue], tuple[ScalarValue, ScalarValue], ScalarValue]

Comparison value (simple) or right operand (compound).

is_compound bool

True for AND/OR/NOT conditions, False for simple comparisons.

params list[ScalarValue]

SQL parameters extracted from the condition for parameterized queries.

Methods

to_sql

Generate SQL WHERE clause and parameters.

def to_sql(
    self,
    dialect: Union[
        SQLDialect, Literal["sqlite", "duckdb", "postgres"]
    ] = SQLDialect.SQLITE,
) -> tuple[str, list[Any]]
dialect Union[SQLDialect, Literal['sqlite', 'duckdb', 'postgres']]

Target SQL dialect (sqlite, duckdb, or postgres).

Columns

Entry point for building filter expressions.

Supports both dot notation and bracket notation for accessing columns:

from inspect_scout import columns as c

c.column_name
c["column_name"]
c["nested.json.path"]
class Columns

Attributes

transcript_id Column

Globally unique identifier for transcript.

source_type Column

Type of transcript source (e.g. “eval_log”, “weave”, etc.).

source_id Column

Globally unique identifier of transcript source (e.g. ‘eval_id’ in Inspect logs).

source_uri Column

URI for source data (e.g. full path to the Inspect log file or weave op).

date Column

Date transcript was created.

task_set Column

Set from which transcript task was drawn (e.g. benchmark name).

task_id Column

Identifier for task (e.g. dataset sample id).

task_repeat Column

Repeat for a given task id within a task set (e.g. epoch).

agent Column

Agent name.

agent_args Column

Agent args.

model Column

Model used for eval.

model_options Column

Generation options for model.

score Column

Headline score value.

success Column

Reduction of ‘score’ to True/False sucess.

total_time Column

Total execution time.

error Column

Error that halted exeuction.

limit Column

Limit that halted execution.

columns

Column selector for where expressions.

Typically aliased to a more compact expression (e.g. c) for use in queries). For example:

from inspect_scout import columns as c
filter = c.model == "gpt-4"
filter = (c.task_set == "math") & (c.epochs > 1)
columns = Columns()

LogColumns

Typed column interface for Inspect log transcripts.

Provides typed properties for standard Inspect log columns while preserving the ability to access custom fields through the base Metadata class methods.

class LogColumns(Columns)

Attributes

sample_id Column

Unique id for sample.

eval_id Column

Globally unique id for eval.

eval_status Column

Status of eval.

log Column

Location that the log file was read from.

eval_tags Column

Tags associated with evaluation run.

eval_metadata Column

Additional eval metadata.

task_args Column

Task arguments.

generate_config Column

Generate config specified for model instance.

model_roles Column

Model roles.

id Column

Unique id for sample.

epoch Column

Epoch number for sample.

input Column

Sample input.

target Column

Sample target.

sample_metadata Column

Sample metadata.

working_time Column

Time spent working (model generation, sandbox calls, etc.).

log_columns

Log columns selector for where expressions.

Typically aliased to a more compact expression (e.g. c) for use in queries). For example:

from inspect_scout import log_columns as c

# typed access to standard fields
filter = c.model == "gpt-4"
filter = (c.task_set == "math") & (c.epochs > 1)

# dynamic access to custom fields
filter = c["custom_field"] > 100
log_columns = LogColumns()