Transcript API
Reading
transcripts_from
Read transcripts for scanning.
Transcripts may be stored in a TranscriptDB or may be Inspect eval logs.
def transcripts_from(location: str | Logs) -> Transcriptslocationstr | Logs-
Transcripts location. Either a path to a transcript database or path(s) to Inspect eval logs.
Transcript
Transcript info and transcript content (messages and events).
class Transcript(TranscriptInfo)Attributes
transcript_idstr-
Globally unique id for transcript (e.g. sample uuid).
source_typestr | None-
Type of source for transcript (e.g. “eval_log”).
source_idstr | None-
Globally unique ID for transcript source (e.g. eval_id).
source_uristr | None-
Optional. URI for source data (e.g. log file path).
datestr | None-
Date/time when the transcript was created.
task_setstr | None-
Set from which transcript task was drawn (e.g. benchmark name).
task_idstr | None-
Identifier for task (e.g. dataset sample id).
task_repeatint | None-
Repeat for a given task id within a task set (e.g. epoch).
agentstr | None-
Agent used to to execute task..
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
scoreJsonValue | None-
Value indicating score on task.
successbool | None-
Boolean reduction of score to succeeded/failed.
message_countint | None-
Total messages in conversation.
total_timefloat | None-
Time required to execute task (seconds).
total_tokensint | None-
Tokens spent in execution of task.
errorstr | None-
“Error message that terminated the task.
limitstr | None-
Limit that caused the task to exit (e.g. “tokens”, “messages, etc.).
metadatadict[str, Any]-
Transcript source specific metadata.
messageslist[ChatMessage]-
Main message thread.
eventslist[Event]-
Events from transcript.
Transcripts
Collection of transcripts for scanning.
Transcript collections can be filtered using the where(), limit(), shuffle(), and order_by() methods. The transcripts are not modified in place so the filtered transcripts should be referenced via the return value. For example:
from inspect_scout import transcripts, columns as c
transcripts = transcripts_from("./logs")
transcripts = transcripts.where(c.task_set == "cybench")class Transcripts(abc.ABC)Methods
- where
-
Filter the transcript collection by a SQL WHERE clause or Condition.
def where(self, condition: str | Condition) -> "Transcripts"conditionstr | Condition-
Filter condition.
- for_validation
-
Filter transcripts to only those with IDs matching validation cases.
def for_validation( self, validation: str | ValidationSet | Mapping[str, str | ValidationSet] ) -> "Transcripts"validationstr | ValidationSet | Mapping[str, str | ValidationSet]-
Validation cases to filter by. Can be a file path (CSV, JSON, JSONL, YAML), a ValidationSet, or a dict mapping scanner names to file paths or ValidationSets.
- limit
-
Limit the number of transcripts processed.
def limit(self, n: int) -> "Transcripts"nint-
Limit on transcripts.
- shuffle
-
Shuffle the order of transcripts.
def shuffle(self, seed: int | None = None) -> "Transcripts"seedint | None-
Random seed for shuffling.
- order_by
-
Order transcripts by column.
Can be chained multiple times for tie-breaking. If shuffle() is also used, shuffle takes precedence.
def order_by( self, column: Column, direction: Literal["ASC", "DESC"] = "ASC" ) -> "Transcripts"columnColumn-
Column to sort by.
directionLiteral['ASC', 'DESC']-
Sort direction (“ASC” or “DESC”).
- reader
-
Read the selected transcripts.
@abc.abstractmethod def reader(self, snapshot: ScanTranscripts | None = None) -> TranscriptsReadersnapshotScanTranscripts | None-
An optional snapshot which provides hints to make the reader more efficient (e.g. by preventing a full scan to find transcript_id => filename mappings)
- from_snapshot
-
Restore transcripts from a snapshot.
@staticmethod @abc.abstractmethod def from_snapshot(snapshot: ScanTranscripts) -> "Transcripts"snapshotScanTranscripts
TranscriptsReader
Read transcripts based on a TranscriptsQuery.
class TranscriptsReader(abc.ABC)Methods
- index
-
Index of
TranscriptInfofor the collection.@abc.abstractmethod def index(self) -> AsyncIterator[TranscriptInfo] - read
-
Read transcript content.
@abc.abstractmethod async def read( self, transcript: TranscriptInfo, content: TranscriptContent ) -> TranscripttranscriptTranscriptInfo-
Transcript to read.
contentTranscriptContent-
Content to read (e.g. specific message types, etc.)
SampleMetadata
Typed accessor for sample metadata from Inspect eval logs.
Provides typed properties for accessing metadata fields specific to Inspect eval logs, while preserving the lazy JSON parsing optimization. Raises an error if the transcript is not from an Inspect eval log.
class SampleMetadataAttributes
eval_idstr-
Globally unique id for eval. Same as EvalLog.eval.eval_id.
logstr-
Location that the log file was read from. Same as EvalLog.location.
eval_statusEvalStatus-
Status of eval. Same as EvalLog.status.
eval_tagslist[str] | None-
Tags associated with evaluation run. Same as EvalLog.eval.tags.
eval_metadatadict[str, Any] | None-
Additional eval metadata. Same as EvalLog.eval.metadata.
task_argsdict[str, Any]-
Task arguments. Same as EvalLog.eval.task_args.
generate_configGenerateConfig-
Generate config for model instance. Same as EvalLog.eval.model_generate_config.
model_rolesdict[str, Any] | None-
Model roles. Same as EvalLog.eval.model_roles.
idstr-
Unique id for sample. Same as str(EvalSampleSummary.id).
epochint-
Epoch number for sample. Same as EvalSampleSummary.epoch.
inputstr-
Sample input. Derived from EvalSampleSummary.input (converted to string).
targetlist[str]-
Sample target value(s). Derived from EvalSampleSummary.target.
Stored as comma-separated string; parsed back to list.
sample_metadatadict[str, Any]-
Sample metadata. Same as EvalSampleSummary.metadata.
working_timefloat | None-
Working time for the sample. Same as EvalSampleSummary.working_time.
score_valuesdict[str, Value]-
Score values for this sample. Derived from EvalSampleSummary.scores.
Note: Only score values are stored in transcript metadata, not full Score objects (answer, explanation, metadata are not available).
Methods
- __init__
-
Initialize SampleMetadata wrapper.
def __init__(self, transcript: Transcript) -> NonetranscriptTranscript-
A Transcript from an Inspect eval log.
Database
transcripts_db
Read/write interface to transcripts database.
def transcripts_db(location: str) -> TranscriptsDBlocationstr-
Database location (e.g. directory or S3 bucket).
transcripts_db_schema
Get transcript database schema in various formats.
def transcripts_db_schema(
format: Literal["pyarrow", "avro", "json", "pandas"] = "pyarrow",
) -> pa.Schema | dict[str, Any] | pd.DataFrameformatLiteral['pyarrow', 'avro', 'json', 'pandas']-
Output format: - “pyarrow”: PyArrow Schema for creating Parquet files - “avro”: Avro schema as dict (JSON-serializable) - “json”: JSON Schema as dict - “pandas”: Empty DataFrame with correct dtypes
TranscriptsDB
Database of transcripts with write capability.
class TranscriptsDB(TranscriptsView)Methods
- connect
-
Connect to transcripts database.
@abc.abstractmethod async def connect(self) -> None - disconnect
-
Disconnect from transcripts database.
@abc.abstractmethod async def disconnect(self) -> None - transcript_ids
-
Get transcript IDs matching query.
Optimized method that returns only transcript IDs without loading full metadata.
@abc.abstractmethod async def transcript_ids(self, query: Query | None = None) -> dict[str, str | None]queryQuery | None-
Query with where/limit/shuffle/order_by criteria.
- select
-
Select transcripts matching query.
@abc.abstractmethod def select(self, query: Query | None = None) -> AsyncIterator[TranscriptInfo]queryQuery | None-
Query with where/limit/shuffle/order_by criteria.
- count
-
Count transcripts matching query.
@abc.abstractmethod async def count(self, query: Query | None = None) -> intqueryQuery | None-
Query with where criteria (limit/shuffle/order_by ignored).
- read
-
Read transcript content.
@abc.abstractmethod async def read( self, t: TranscriptInfo, content: TranscriptContent, max_bytes: int | None = None, ) -> TranscripttTranscriptInfo-
Transcript to read.
contentTranscriptContent-
Content to read (messages, events, etc.)
max_bytesint | None-
Max content size in bytes. Raises TranscriptTooLargeError if exceeded.
- read_messages_events
-
Get messages/events stream handle.
Returns TranscriptMessagesAndEvents with a lazy
datacontext manager. The stream is not opened untildatais entered. Caller can safely hold the result after view context exits.Note: The JSON may contain an ‘attachments’ dict at the top level. Strings within ‘messages’ and ‘events’ may contain references like ‘attachment://<32-char-hex-id>’ that must be resolved by looking up the ID in the ‘attachments’ dict.
@abc.abstractmethod async def read_messages_events( self, t: TranscriptInfo ) -> TranscriptMessagesAndEventstTranscriptInfo-
Transcript to read messages/events for.
- distinct
-
Get distinct values of a column, sorted ascending.
@abc.abstractmethod async def distinct( self, column: str, condition: Condition | None ) -> list[ScalarValue]columnstr-
Column to get distinct values for.
conditionCondition | None-
Filter condition, or None for no filter.
- insert
-
Insert transcripts into database.
@abc.abstractmethod async def insert( self, transcripts: Iterable[Transcript] | AsyncIterable[Transcript] | Transcripts | pa.RecordBatchReader, session_id: str | None = None, commit: bool = True, ) -> NonetranscriptsIterable[Transcript] | AsyncIterable[Transcript] | Transcripts | pa.RecordBatchReader-
Transcripts to insert (iterable, async iterable, or source).
session_idstr | None-
Optional session ID to include in parquet filenames. Used for session-scoped compaction at commit time.
commitbool-
If True (default), commit after insert (compact + index). If False, defer commit for batch operations. Call commit() explicitly when ready to finalize.
- commit
-
Commit pending changes.
For parquet: compacts data files + rebuilds index. For relational DBs: may be a no-op or transaction commit.
This is called automatically when insert() is called with commit=True (the default). Only call this manually when using commit=False with insert() for batch operations.
@abc.abstractmethod async def commit(self, session_id: str | None = None) -> Nonesession_idstr | None-
Optional session ID for session-scoped compaction. When provided, parquet files created during this session are compacted into fewer larger files before index compaction.
Filtering
Column
Database column with comparison operators.
Supports various predicate functions including like(), not_like(), between(), etc. Additionally supports standard python equality and comparison operators (e.g. ==, ’>`, etc.
class ColumnMethods
- in_
-
Check if value is in a list.
def in_(self, values: list[Any]) -> Conditionvalueslist[Any]
- not_in
-
Check if value is not in a list.
def not_in(self, values: list[Any]) -> Conditionvalueslist[Any]
- like
-
SQL LIKE pattern matching (case-sensitive).
def like(self, pattern: str) -> Conditionpatternstr
- not_like
-
SQL NOT LIKE pattern matching (case-sensitive).
def not_like(self, pattern: str) -> Conditionpatternstr
- ilike
-
PostgreSQL ILIKE pattern matching (case-insensitive).
Note: For SQLite and DuckDB, this will use LIKE with LOWER() for case-insensitivity.
def ilike(self, pattern: str) -> Conditionpatternstr
- not_ilike
-
PostgreSQL NOT ILIKE pattern matching (case-insensitive).
Note: For SQLite and DuckDB, this will use NOT LIKE with LOWER() for case-insensitivity.
def not_ilike(self, pattern: str) -> Conditionpatternstr
- is_null
-
Check if value is NULL.
def is_null(self) -> Condition - is_not_null
-
Check if value is not NULL.
def is_not_null(self) -> Condition - between
-
Check if value is between two values.
def between(self, low: Any, high: Any) -> ConditionlowAny-
Lower bound (inclusive). If None, raises ValueError.
highAny-
Upper bound (inclusive). If None, raises ValueError.
- not_between
-
Check if value is not between two values.
def not_between(self, low: Any, high: Any) -> ConditionlowAny-
Lower bound (inclusive). If None, raises ValueError.
highAny-
Upper bound (inclusive). If None, raises ValueError.
Condition
WHERE clause condition that can be combined with others.
class Condition(BaseModel)Attributes
leftstr | 'Condition' | None-
Column name (simple) or left operand (compound).
operatorOperator | LogicalOperator | None-
Comparison operator (simple) or logical operator (compound).
right'Condition' | list[ScalarValue] | tuple[ScalarValue, ScalarValue] | ScalarValue-
Comparison value (simple) or right operand (compound).
is_compoundbool-
True for AND/OR/NOT conditions, False for simple comparisons.
paramslist[ScalarValue]-
SQL parameters extracted from the condition for parameterized queries.
Columns
Entry point for building filter expressions.
Supports both dot notation and bracket notation for accessing columns:
from inspect_scout import columns as c
c.column_name
c["column_name"]
c["nested.json.path"]class ColumnsAttributes
transcript_idColumn-
Globally unique identifier for transcript.
source_typeColumn-
Type of transcript source (e.g. “eval_log”, “weave”, etc.).
source_idColumn-
Globally unique identifier of transcript source (e.g. ‘eval_id’ in Inspect logs).
source_uriColumn-
URI for source data (e.g. full path to the Inspect log file or weave op).
dateColumn-
Date transcript was created.
task_setColumn-
Set from which transcript task was drawn (e.g. benchmark name).
task_idColumn-
Identifier for task (e.g. dataset sample id).
task_repeatColumn-
Repeat for a given task id within a task set (e.g. epoch).
agentColumn-
Agent name.
agent_argsColumn-
Agent args.
modelColumn-
Model used for eval.
model_optionsColumn-
Generation options for model.
scoreColumn-
Headline score value.
successColumn-
Reduction of ‘score’ to True/False sucess.
message_countColumn-
Messages in conversation.
total_timeColumn-
Total execution time.
errorColumn-
Error that halted exeuction.
limitColumn-
Limit that halted execution.
columns
Column selector for where expressions.
Typically aliased to a more compact expression (e.g. c) for use in queries). For example:
from inspect_scout import columns as c
filter = c.model == "gpt-4"
filter = (c.task_set == "math") & (c.epochs > 1)columns = Columns()LogColumns
Typed column interface for Inspect log transcripts.
Provides typed properties for standard Inspect log columns while preserving the ability to access custom fields through the base Metadata class methods.
class LogColumns(Columns)Attributes
transcript_idColumn-
Globally unique identifier for transcript.
source_typeColumn-
Type of transcript source (e.g. “eval_log”, “weave”, etc.).
source_idColumn-
Globally unique identifier of transcript source (e.g. ‘eval_id’ in Inspect logs).
source_uriColumn-
URI for source data (e.g. full path to the Inspect log file or weave op).
dateColumn-
Date transcript was created.
task_setColumn-
Set from which transcript task was drawn (e.g. benchmark name).
task_idColumn-
Identifier for task (e.g. dataset sample id).
task_repeatColumn-
Repeat for a given task id within a task set (e.g. epoch).
agentColumn-
Agent name.
agent_argsColumn-
Agent args.
modelColumn-
Model used for eval.
model_optionsColumn-
Generation options for model.
scoreColumn-
Headline score value.
successColumn-
Reduction of ‘score’ to True/False sucess.
message_countColumn-
Messages in conversation.
total_timeColumn-
Total execution time.
errorColumn-
Error that halted exeuction.
limitColumn-
Limit that halted execution.
sample_idColumn-
Unique id for sample.
eval_idColumn-
Globally unique id for eval.
eval_statusColumn-
Status of eval.
logColumn-
Location that the log file was read from.
eval_tagsColumn-
Tags associated with evaluation run.
eval_metadataColumn-
Additional eval metadata.
task_argsColumn-
Task arguments.
generate_configColumn-
Generate config specified for model instance.
model_rolesColumn-
Model roles.
idColumn-
Unique id for sample.
epochColumn-
Epoch number for sample.
inputColumn-
Sample input.
targetColumn-
Sample target.
sample_metadataColumn-
Sample metadata.
working_timeColumn-
Time spent working (model generation, sandbox calls, etc.).
log_columns
Log columns selector for where expressions.
Typically aliased to a more compact expression (e.g. c) for use in queries). For example:
from inspect_scout import log_columns as c
# typed access to standard fields
filter = c.model == "gpt-4"
filter = (c.task_set == "math") & (c.epochs > 1)
# dynamic access to custom fields
filter = c["custom_field"] > 100log_columns = LogColumns()Observe
observe
Observe decorator/context manager for transcript capture.
Works as decorator (@observe, @observe(), @observe(task_set=“x”)) or context manager (async with observe():).
Uses implicit leaf detection: the innermost observe context (one with no children) triggers transcript write to the database. This allows nesting observe contexts where the outer context sets shared parameters and inner contexts represent individual transcript entries.
def observe(
func: Callable[OP, Awaitable[OR]] | None = None,
*,
provider: (
Literal["inspect", "openai", "anthropic", "google"]
| ObserveProvider
| Sequence[ObserveProviderName | ObserveProvider]
| None
) = "inspect",
db: str | TranscriptsDB | None = None,
# TranscriptInfo fields (ordered to match TranscriptInfo for consistency)
source_type: str = "observe",
source_id: str | None = None,
source_uri: str | None = None,
task_set: str | None = None,
task_id: str | None = None,
task_repeat: int | None = None,
agent: str | None = None,
agent_args: dict[str, Any] | None = None,
model: str | None = None,
model_options: dict[str, Any] | None = None,
metadata: dict[str, Any] | None = None,
# Full TranscriptInfo for advanced use
info: TranscriptInfo | None = None,
) -> Callable[OP, Awaitable[OR]] | _ObserveContextManagerfuncCallable[OP, Awaitable[OR]] | None-
The async function to decorate (when used as @observe without parens).
providerLiteral['inspect', 'openai', 'anthropic', 'google'] | ObserveProvider | Sequence[ObserveProviderName | ObserveProvider] | None-
Provider(s) for capturing LLM calls. Can be a provider name (“inspect”, “openai”, “anthropic”, “google”), an ObserveProvider instance, or a sequence of either. Defaults to “inspect” which captures Inspect AI model.generate() calls. Use other providers to capture direct SDK calls. Can only be set on root observe.
dbstr | TranscriptsDB | None-
Transcript database or path for writing. Can be a TranscriptsDB instance or a string path (which will be passed to transcripts_db()). Only valid on outermost observe; defaults to project transcripts directory.
source_typestr-
Type of source for transcript. Defaults to “observe”.
source_idstr | None-
Globally unique ID for transcript source (e.g. eval_id).
source_uristr | None-
URI for source data (e.g. log file path).
task_setstr | None-
Set from which transcript task was drawn (e.g. benchmark name).
task_idstr | None-
Identifier for task (e.g. dataset sample id).
task_repeatint | None-
Repeat for a given task id within a task set (e.g. epoch).
agentstr | None-
Agent used to execute task.
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
metadatadict[str, Any] | None-
Transcript source specific metadata (merged with parent).
infoTranscriptInfo | None-
Full TranscriptInfo for advanced use (fields override parent, explicit args override info).
observe_update
Update the current observe context’s TranscriptInfo fields.
Call this from within an @observe decorated function or observe() context to set transcript fields after execution (e.g., score, success, limit).
def observe_update(
*,
source_type: str | None = None,
source_id: str | None = None,
source_uri: str | None = None,
task_set: str | None = None,
task_id: str | None = None,
task_repeat: int | None = None,
agent: str | None = None,
agent_args: dict[str, Any] | None = None,
model: str | None = None,
model_options: dict[str, Any] | None = None,
score: JsonValue | None = None,
success: bool | None = None,
limit: str | None = None,
metadata: dict[str, Any] | None = None,
) -> Nonesource_typestr | None-
Type of source for transcript.
source_idstr | None-
Globally unique ID for transcript source.
source_uristr | None-
URI for source data.
task_setstr | None-
Set from which transcript task was drawn.
task_idstr | None-
Identifier for task.
task_repeatint | None-
Repeat for a given task id within a task set.
agentstr | None-
Agent used to execute task.
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
scoreJsonValue | None-
Value indicating score on task.
successbool | None-
Boolean reduction of score to succeeded/failed.
limitstr | None-
Limit that caused the task to exit.
metadatadict[str, Any] | None-
Transcript source specific metadata (merged, not replaced).
ObserveProvider
Protocol for LLM capture providers.
@runtime_checkable
class ObserveProvider(Protocol)Methods
- install
-
Install hooks/patches for capturing LLM calls.
Called once per provider class. Implementations should be idempotent.
def install(self, emit: ObserveEmit) -> NoneemitObserveEmit-
Sync callback to emit raw captured data. Call with a dict containing request/response data. Framework handles context checking - emit() is a no-op if not inside an observe context. The dict structure is provider-defined and passed to build_event().
- build_event
-
Convert raw captured data to an Inspect Event.
Called by the framework at observe exit for each captured item. This is where async conversion (using Inspect AI converters) happens.
async def build_event(self, data: dict[str, Any]) -> Eventdatadict[str, Any]-
The dict passed to emit() during capture.
ObserveEmit
Sync function to emit raw captured data. Called by provider wrappers.
ObserveEmit = Callable[[dict[str, Any]], None]