Transcript API
transcripts_from_logs
Read sample transcripts from eval logs.
def transcripts_from_logs(logs: LogPaths) -> TranscriptslogsLogPaths-
Log paths as file(s) or directories.
Transcripts
Collection of transcripts for scanning.
Transcript collections can be filtered using the where(), limit(), and ’shuffle()` methods. The transcripts are not modified in place so the filtered transcripts should be referenced via the return value. For example:
from inspect_scout import transcripts, log_metadata as m
transcripts = transcripts_from_logs("./logs")
transcripts = transcripts.where(m.task_name == "cybench")class Transcripts(abc.ABC)Methods
- where
-
Filter the transcript collection by a Condition.
def where(self, condition: Condition) -> "Transcripts"conditionCondition-
Filter condition.
- for_validation
-
Filter transcripts to only those with IDs matching validation cases.
def for_validation( self, validation: ValidationSet | dict[str, ValidationSet] ) -> "Transcripts"validationValidationSet | dict[str, ValidationSet]-
Validation object containing cases with target IDs.
- limit
-
Limit the number of transcripts processed.
def limit(self, n: int) -> "Transcripts"nint-
Limit on transcripts.
- shuffle
-
Shuffle the order of transcripts.
def shuffle(self, seed: int | None = None) -> "Transcripts"seedint | None-
Random seed for shuffling.
- count
-
Number of transcripts in collection.
@abc.abstractmethod async def count(self) -> int - index
-
Index of TranscriptInfo for the collection.
@abc.abstractmethod async def index(self) -> Iterator[TranscriptInfo]
TranscriptInfo
Transcript identifier, location, and metadata.
class TranscriptInfo(BaseModel)Attributes
idstr-
Globally unique id for transcript (e.g. sample uuid).
source_idstr-
Globally unique ID for transcript source (e.g. eval_id).
source_uristr-
URI for source data (e.g. log file path)
scoreJsonValue | None-
Main score assigned to transcript (optional)
scoresdict[str, JsonValue]-
All scores assigned to transcript.
variablesdict[str, JsonValue]-
Variables (e.g. to be used in a prompt template) associated with transcript (e.g. sample metadata).
metadatadict[str, JsonValue]-
Transcript source specific metadata (e.g. model, task name, errors, epoch, dataset sample id, limits, etc.).
Transcript
Transcript info and transcript content (messages and events).
class Transcript(TranscriptInfo)Attributes
messageslist[ChatMessage]-
Main message thread.
eventslist[Event]-
Events from transcript.
Column
Database column with comparison operators.
Supports various predicate functions including like(), not_like(), between(), etc. Additionally supports standard python equality and comparison operators (e.g. ==, ’>`, etc.
class ColumnMethods
- in_
-
Check if value is in a list.
def in_(self, values: list[Any]) -> Conditionvalueslist[Any]
- not_in
-
Check if value is not in a list.
def not_in(self, values: list[Any]) -> Conditionvalueslist[Any]
- like
-
SQL LIKE pattern matching (case-sensitive).
def like(self, pattern: str) -> Conditionpatternstr
- not_like
-
SQL NOT LIKE pattern matching (case-sensitive).
def not_like(self, pattern: str) -> Conditionpatternstr
- ilike
-
PostgreSQL ILIKE pattern matching (case-insensitive).
Note: For SQLite and DuckDB, this will use LIKE with LOWER() for case-insensitivity.
def ilike(self, pattern: str) -> Conditionpatternstr
- not_ilike
-
PostgreSQL NOT ILIKE pattern matching (case-insensitive).
Note: For SQLite and DuckDB, this will use NOT LIKE with LOWER() for case-insensitivity.
def not_ilike(self, pattern: str) -> Conditionpatternstr
- is_null
-
Check if value is NULL.
def is_null(self) -> Condition - is_not_null
-
Check if value is not NULL.
def is_not_null(self) -> Condition - between
-
Check if value is between two values.
def between(self, low: Any, high: Any) -> ConditionlowAny-
Lower bound (inclusive). If None, raises ValueError.
highAny-
Upper bound (inclusive). If None, raises ValueError.
- not_between
-
Check if value is not between two values.
def not_between(self, low: Any, high: Any) -> ConditionlowAny-
Lower bound (inclusive). If None, raises ValueError.
highAny-
Upper bound (inclusive). If None, raises ValueError.
Condition
WHERE clause condition that can be combined with others.
class ConditionMethods
- to_sql
-
Generate SQL WHERE clause and parameters.
def to_sql( self, dialect: Union[ SQLDialect, Literal["sqlite", "duckdb", "postgres"] ] = SQLDialect.SQLITE, ) -> tuple[str, list[Any]]dialectUnion[SQLDialect, Literal['sqlite', 'duckdb', 'postgres']]-
Target SQL dialect (sqlite, duckdb, or postgres).
Metadata
Entry point for building metadata filter expressions.
class Metadatametadata
Metadata selector for where expressions.
Typically aliased to a more compact expression (e.g. m) for use in queries). For example:
from inspect_scout import metadata as m
filter = m.model == "gpt-4"
filter = (m.task_name == "math") & (m.epochs > 1)metadata = Metadata()LogMetadata
Typed metadata interface for Inspect log transcripts.
Provides typed properties for standard Inspect log columns while preserving the ability to access custom fields through the base Metadata class methods.
class LogMetadata(Metadata)Attributes
sample_idColumn-
Unique id for sample.
eval_idColumn-
Globally unique id for eval.
logColumn-
Location that the log file was read from.
eval_createdColumn-
Time eval was created.
eval_tagsColumn-
Tags associated with evaluation run.
eval_metadataColumn-
Additional eval metadata.
task_nameColumn-
Task name.
task_argsColumn-
Task arguments.
solverColumn-
Solver name.
solver_argsColumn-
Arguments used for invoking the solver.
modelColumn-
Model used for eval.
generate_configColumn-
Generate config specified for model instance.
model_rolesColumn-
Model roles.
idColumn-
Unique id for sample.
epochColumn-
Epoch number for sample.
inputColumn-
Sample input.
targetColumn-
Sample target.
sample_metadataColumn-
Sample metadata.
scoreColumn-
Headline score value.
total_tokensColumn-
Total tokens used for sample.
total_timeColumn-
Total time that the sample was running.
working_timeColumn-
Time spent working (model generation, sandbox calls, etc.).
errorColumn-
Error that halted the sample.
limitColumn-
Limit that halted the sample.
log_metadata
Log metadata selector for where expressions.
Typically aliased to a more compact expression (e.g. m) for use in queries). For example:
from inspect_scout import log_metadata as m
# typed access to standard fields
filter = m.model == "gpt-4"
filter = (m.task_name == "math") & (m.epochs > 1)
# dynamic access to custom fields
filter = m["custom_field"] > 100log_metadata = LogMetadata()