Transcript API

transcripts_from_logs

Read sample transcripts from eval logs.

def transcripts_from_logs(logs: LogPaths) -> Transcripts
logs LogPaths

Log paths as file(s) or directories.

Transcripts

Collection of transcripts for scanning.

Transcript collections can be filtered using the where(), limit(), and ’shuffle()` methods. The transcripts are not modified in place so the filtered transcripts should be referenced via the return value. For example:

from inspect_scout import transcripts, log_metadata as m

transcripts = transcripts_from_logs("./logs")
transcripts = transcripts.where(m.task_name == "cybench")
class Transcripts(abc.ABC)

Methods

where

Filter the transcript collection by a Condition.

def where(self, condition: Condition) -> "Transcripts"
condition Condition

Filter condition.

for_validation

Filter transcripts to only those with IDs matching validation cases.

def for_validation(
    self, validation: ValidationSet | dict[str, ValidationSet]
) -> "Transcripts"
validation ValidationSet | dict[str, ValidationSet]

Validation object containing cases with target IDs.

limit

Limit the number of transcripts processed.

def limit(self, n: int) -> "Transcripts"
n int

Limit on transcripts.

shuffle

Shuffle the order of transcripts.

def shuffle(self, seed: int | None = None) -> "Transcripts"
seed int | None

Random seed for shuffling.

count

Number of transcripts in collection.

@abc.abstractmethod
async def count(self) -> int
index

Index of TranscriptInfo for the collection.

@abc.abstractmethod
async def index(self) -> Iterator[TranscriptInfo]

TranscriptInfo

Transcript identifier, location, and metadata.

class TranscriptInfo(BaseModel)

Attributes

id str

Globally unique id for transcript (e.g. sample uuid).

source_id str

Globally unique ID for transcript source (e.g. eval_id).

source_uri str

URI for source data (e.g. log file path)

score JsonValue | None

Main score assigned to transcript (optional)

scores dict[str, JsonValue]

All scores assigned to transcript.

variables dict[str, JsonValue]

Variables (e.g. to be used in a prompt template) associated with transcript (e.g. sample metadata).

metadata dict[str, JsonValue]

Transcript source specific metadata (e.g. model, task name, errors, epoch, dataset sample id, limits, etc.).

Transcript

Transcript info and transcript content (messages and events).

class Transcript(TranscriptInfo)

Attributes

messages list[ChatMessage]

Main message thread.

events list[Event]

Events from transcript.

Column

Database column with comparison operators.

Supports various predicate functions including like(), not_like(), between(), etc. Additionally supports standard python equality and comparison operators (e.g. ==, ’>`, etc.

class Column

Methods

in_

Check if value is in a list.

def in_(self, values: list[Any]) -> Condition
values list[Any]
not_in

Check if value is not in a list.

def not_in(self, values: list[Any]) -> Condition
values list[Any]
like

SQL LIKE pattern matching (case-sensitive).

def like(self, pattern: str) -> Condition
pattern str
not_like

SQL NOT LIKE pattern matching (case-sensitive).

def not_like(self, pattern: str) -> Condition
pattern str
ilike

PostgreSQL ILIKE pattern matching (case-insensitive).

Note: For SQLite and DuckDB, this will use LIKE with LOWER() for case-insensitivity.

def ilike(self, pattern: str) -> Condition
pattern str
not_ilike

PostgreSQL NOT ILIKE pattern matching (case-insensitive).

Note: For SQLite and DuckDB, this will use NOT LIKE with LOWER() for case-insensitivity.

def not_ilike(self, pattern: str) -> Condition
pattern str
is_null

Check if value is NULL.

def is_null(self) -> Condition
is_not_null

Check if value is not NULL.

def is_not_null(self) -> Condition
between

Check if value is between two values.

def between(self, low: Any, high: Any) -> Condition
low Any

Lower bound (inclusive). If None, raises ValueError.

high Any

Upper bound (inclusive). If None, raises ValueError.

not_between

Check if value is not between two values.

def not_between(self, low: Any, high: Any) -> Condition
low Any

Lower bound (inclusive). If None, raises ValueError.

high Any

Upper bound (inclusive). If None, raises ValueError.

Condition

WHERE clause condition that can be combined with others.

class Condition

Methods

to_sql

Generate SQL WHERE clause and parameters.

def to_sql(
    self,
    dialect: Union[
        SQLDialect, Literal["sqlite", "duckdb", "postgres"]
    ] = SQLDialect.SQLITE,
) -> tuple[str, list[Any]]
dialect Union[SQLDialect, Literal['sqlite', 'duckdb', 'postgres']]

Target SQL dialect (sqlite, duckdb, or postgres).

Metadata

Entry point for building metadata filter expressions.

class Metadata

metadata

Metadata selector for where expressions.

Typically aliased to a more compact expression (e.g. m) for use in queries). For example:

from inspect_scout import metadata as m
filter = m.model == "gpt-4"
filter = (m.task_name == "math") & (m.epochs > 1)
metadata = Metadata()

LogMetadata

Typed metadata interface for Inspect log transcripts.

Provides typed properties for standard Inspect log columns while preserving the ability to access custom fields through the base Metadata class methods.

class LogMetadata(Metadata)

Attributes

sample_id Column

Unique id for sample.

eval_id Column

Globally unique id for eval.

log Column

Location that the log file was read from.

eval_created Column

Time eval was created.

eval_tags Column

Tags associated with evaluation run.

eval_metadata Column

Additional eval metadata.

task_name Column

Task name.

task_args Column

Task arguments.

solver Column

Solver name.

solver_args Column

Arguments used for invoking the solver.

model Column

Model used for eval.

generate_config Column

Generate config specified for model instance.

model_roles Column

Model roles.

id Column

Unique id for sample.

epoch Column

Epoch number for sample.

input Column

Sample input.

target Column

Sample target.

sample_metadata Column

Sample metadata.

score Column

Headline score value.

total_tokens Column

Total tokens used for sample.

total_time Column

Total time that the sample was running.

working_time Column

Time spent working (model generation, sandbox calls, etc.).

error Column

Error that halted the sample.

limit Column

Limit that halted the sample.

log_metadata

Log metadata selector for where expressions.

Typically aliased to a more compact expression (e.g. m) for use in queries). For example:

from inspect_scout import log_metadata as m

# typed access to standard fields
filter = m.model == "gpt-4"
filter = (m.task_name == "math") & (m.epochs > 1)

# dynamic access to custom fields
filter = m["custom_field"] > 100
log_metadata = LogMetadata()