Transcript API
Reading
transcripts_from
Read transcripts for scanning.
Transcripts may be stored in a TranscriptDB or may be Inspect eval logs.
def transcripts_from(location: str | Logs) -> Transcriptslocationstr | Logs-
Transcripts location. Either a path to a transcript database or path(s) to Inspect eval logs.
Transcript
Transcript info and transcript content (messages and events).
class Transcript(TranscriptInfo)Attributes
transcript_idstr-
Globally unique id for transcript (e.g. sample uuid).
source_typestr | None-
Type of source for transcript (e.g. “eval_log”).
source_idstr | None-
Globally unique ID for transcript source (e.g. eval_id).
source_uristr | None-
Optional. URI for source data (e.g. log file path).
datestr | None-
Date/time when the transcript was created.
task_setstr | None-
Set from which transcript task was drawn (e.g. benchmark name).
task_idstr | None-
Identifier for task (e.g. dataset sample id).
task_repeatint | None-
Repeat for a given task id within a task set (e.g. epoch).
agentstr | None-
Agent used to to execute task..
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
scoreJsonValue | None-
Value indicating score on task.
successbool | None-
Boolean reduction of score to succeeded/failed.
message_countint | None-
Total messages in conversation.
total_timefloat | None-
Time required to execute task (seconds).
total_tokensint | None-
Tokens spent in execution of task.
errorstr | None-
“Error message that terminated the task.
limitstr | None-
Limit that caused the task to exit (e.g. “tokens”, “messages, etc.).
metadatadict[str, Any]-
Transcript source specific metadata.
messageslist[ChatMessage]-
Main message thread.
eventslist[Event]-
Events from transcript.
timelineslist[Timeline]-
Timeline views over the transcript.
TranscriptInfo
Transcript identifier, location, and metadata.
class TranscriptInfo(BaseModel)Attributes
transcript_idstr-
Globally unique id for transcript (e.g. sample uuid).
source_typestr | None-
Type of source for transcript (e.g. “eval_log”).
source_idstr | None-
Globally unique ID for transcript source (e.g. eval_id).
source_uristr | None-
Optional. URI for source data (e.g. log file path).
datestr | None-
Date/time when the transcript was created.
task_setstr | None-
Set from which transcript task was drawn (e.g. benchmark name).
task_idstr | None-
Identifier for task (e.g. dataset sample id).
task_repeatint | None-
Repeat for a given task id within a task set (e.g. epoch).
agentstr | None-
Agent used to to execute task..
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
scoreJsonValue | None-
Value indicating score on task.
successbool | None-
Boolean reduction of score to succeeded/failed.
message_countint | None-
Total messages in conversation.
total_timefloat | None-
Time required to execute task (seconds).
total_tokensint | None-
Tokens spent in execution of task.
errorstr | None-
“Error message that terminated the task.
limitstr | None-
Limit that caused the task to exit (e.g. “tokens”, “messages, etc.).
metadatadict[str, Any]-
Transcript source specific metadata.
Transcripts
Collection of transcripts for scanning.
Transcript collections can be filtered using the where(), limit(), shuffle(), and order_by() methods. The transcripts are not modified in place so the filtered transcripts should be referenced via the return value. For example:
from inspect_scout import transcripts, columns as c
transcripts = transcripts_from("./logs")
transcripts = transcripts.where(c.task_set == "cybench")class Transcripts(abc.ABC)Methods
- where
-
Filter the transcript collection by a SQL WHERE clause or Condition.
def where(self, condition: str | Condition) -> "Transcripts"conditionstr | Condition-
Filter condition.
- for_validation
-
Filter transcripts to only those with IDs matching validation cases.
def for_validation( self, validation: str | ValidationSet | Mapping[str, str | ValidationSet] ) -> "Transcripts"validationstr | ValidationSet | Mapping[str, str | ValidationSet]-
Validation cases to filter by. Can be a file path (CSV, JSON, JSONL, YAML), a ValidationSet, or a dict mapping scanner names to file paths or ValidationSets.
- limit
-
Limit the number of transcripts processed.
def limit(self, n: int) -> "Transcripts"nint-
Limit on transcripts.
- shuffle
-
Shuffle the order of transcripts.
def shuffle(self, seed: int | None = None) -> "Transcripts"seedint | None-
Random seed for shuffling.
- order_by
-
Order transcripts by column.
Can be chained multiple times for tie-breaking. If shuffle() is also used, shuffle takes precedence.
def order_by( self, column: Column, direction: Literal["ASC", "DESC"] = "ASC" ) -> "Transcripts"columnColumn-
Column to sort by.
directionLiteral['ASC', 'DESC']-
Sort direction (“ASC” or “DESC”).
- reader
-
Read the selected transcripts.
@abc.abstractmethod def reader(self, snapshot: ScanTranscripts | None = None) -> TranscriptsReadersnapshotScanTranscripts | None-
An optional snapshot which provides hints to make the reader more efficient (e.g. by preventing a full scan to find transcript_id => filename mappings)
- from_snapshot
-
Restore transcripts from a snapshot.
@staticmethod @abc.abstractmethod def from_snapshot(snapshot: ScanTranscripts) -> "Transcripts"snapshotScanTranscripts
TranscriptsReader
Read transcripts based on a TranscriptsQuery.
class TranscriptsReader(abc.ABC)Methods
- index
-
Index of TranscriptInfo for the collection.
@abc.abstractmethod def index(self) -> AsyncIterator[TranscriptInfo] - read
-
Read transcript content.
@abc.abstractmethod async def read( self, transcript: TranscriptInfo, content: TranscriptContent ) -> TranscripttranscriptTranscriptInfo-
Transcript to read.
contentTranscriptContent-
Content to read (e.g. specific message types, etc.)
SampleMetadata
Typed accessor for sample metadata from Inspect eval logs.
Provides typed properties for accessing metadata fields specific to Inspect eval logs, while preserving the lazy JSON parsing optimization. Raises an error if the transcript is not from an Inspect eval log.
class SampleMetadataAttributes
eval_idstr-
Globally unique id for eval. Same as EvalLog.eval.eval_id.
logstr-
Location that the log file was read from. Same as EvalLog.location.
eval_statusEvalStatus-
Status of eval. Same as EvalLog.status.
eval_tagslist[str] | None-
Tags associated with evaluation run. Same as EvalLog.eval.tags.
eval_metadatadict[str, Any] | None-
Additional eval metadata. Same as EvalLog.eval.metadata.
task_argsdict[str, Any]-
Task arguments. Same as EvalLog.eval.task_args.
generate_configGenerateConfig-
Generate config for model instance. Same as EvalLog.eval.model_generate_config.
model_rolesdict[str, Any] | None-
Model roles. Same as EvalLog.eval.model_roles.
idstr-
Unique id for sample. Same as str(EvalSampleSummary.id).
epochint-
Epoch number for sample. Same as EvalSampleSummary.epoch.
inputstr-
Sample input. Derived from EvalSampleSummary.input (converted to string).
targetlist[str]-
Sample target value(s). Derived from EvalSampleSummary.target.
Stored as comma-separated string; parsed back to list.
sample_metadatadict[str, Any]-
Sample metadata. Same as EvalSampleSummary.metadata.
working_timefloat | None-
Working time for the sample. Same as EvalSampleSummary.working_time.
score_valuesdict[str, Value]-
Score values for this sample. Derived from EvalSampleSummary.scores.
Note: Only score values are stored in transcript metadata, not full Score objects (answer, explanation, metadata are not available).
Methods
- __init__
-
Initialize SampleMetadata wrapper.
def __init__(self, transcript: Transcript) -> NonetranscriptTranscript-
A Transcript from an Inspect eval log.
Database
transcripts_db
Read/write interface to transcripts database.
def transcripts_db(location: str) -> TranscriptsDBlocationstr-
Database location (e.g. directory or S3 bucket).
transcripts_db_schema
Get transcript database schema in various formats.
def transcripts_db_schema(
format: Literal["pyarrow", "avro", "json", "pandas"] = "pyarrow",
) -> pa.Schema | dict[str, Any] | pd.DataFrameformatLiteral['pyarrow', 'avro', 'json', 'pandas']-
Output format: - “pyarrow”: PyArrow Schema for creating Parquet files - “avro”: Avro schema as dict (JSON-serializable) - “json”: JSON Schema as dict - “pandas”: Empty DataFrame with correct dtypes
TranscriptsDB
Database of transcripts with write capability.
class TranscriptsDB(TranscriptsView)Methods
- connect
-
Connect to transcripts database.
@abc.abstractmethod async def connect(self) -> None - disconnect
-
Disconnect from transcripts database.
@abc.abstractmethod async def disconnect(self) -> None - transcript_ids
-
Get transcript IDs matching query.
Optimized method that returns only transcript IDs without loading full metadata.
@abc.abstractmethod async def transcript_ids(self, query: Query | None = None) -> dict[str, str | None]queryQuery | None-
Query with where/limit/shuffle/order_by criteria.
- select
-
Select transcripts matching query.
@abc.abstractmethod def select(self, query: Query | None = None) -> AsyncIterator[TranscriptInfo]queryQuery | None-
Query with where/limit/shuffle/order_by criteria.
- count
-
Count transcripts matching query.
@abc.abstractmethod async def count(self, query: Query | None = None) -> intqueryQuery | None-
Query with where criteria (limit/shuffle/order_by ignored).
- read
-
Read transcript content.
@abc.abstractmethod async def read( self, t: TranscriptInfo, content: TranscriptContent, max_bytes: int | None = None, ) -> TranscripttTranscriptInfo-
Transcript to read.
contentTranscriptContent-
Content to read (messages, events, etc.)
max_bytesint | None-
Max content size in bytes. Raises TranscriptTooLargeError if exceeded.
- read_messages_events
-
Get messages/events stream handle.
Returns TranscriptMessagesAndEvents with a lazy
datacontext manager. The stream is not opened untildatais entered. Caller can safely hold the result after view context exits.Note: The JSON may contain an ‘attachments’ dict at the top level. Strings within ‘messages’ and ‘events’ may contain references like ‘attachment://<32-char-hex-id>’ that must be resolved by looking up the ID in the ‘attachments’ dict.
@abc.abstractmethod async def read_messages_events( self, t: TranscriptInfo ) -> TranscriptMessagesAndEventstTranscriptInfo-
Transcript to read messages/events for.
- distinct
-
Get distinct values of a column, sorted ascending.
@abc.abstractmethod async def distinct( self, column: str, condition: Condition | None ) -> list[ScalarValue]columnstr-
Column to get distinct values for.
conditionCondition | None-
Filter condition, or None for no filter.
- insert
-
Insert transcripts into database.
@abc.abstractmethod async def insert( self, transcripts: Iterable[Transcript] | AsyncIterable[Transcript] | Transcripts | pa.RecordBatchReader, session_id: str | None = None, commit: bool = True, ) -> NonetranscriptsIterable[Transcript] | AsyncIterable[Transcript] | Transcripts | pa.RecordBatchReader-
Transcripts to insert (iterable, async iterable, or source).
session_idstr | None-
Optional session ID to include in parquet filenames. Used for session-scoped compaction at commit time.
commitbool-
If True (default), commit after insert (compact + index). If False, defer commit for batch operations. Call commit() explicitly when ready to finalize.
- commit
-
Commit pending changes.
For parquet: compacts data files + rebuilds index. For relational DBs: may be a no-op or transaction commit.
This is called automatically when insert() is called with commit=True (the default). Only call this manually when using commit=False with insert() for batch operations.
@abc.abstractmethod async def commit(self, session_id: str | None = None) -> Nonesession_idstr | None-
Optional session ID for session-scoped compaction. When provided, parquet files created during this session are compacted into fewer larger files before index compaction.
Filtering
Column
Database column with comparison operators.
Supports various predicate functions including like(), not_like(), between(), etc. Additionally supports standard python equality and comparison operators (e.g. ==, >, etc.
class ColumnMethods
- in_
-
Check if value is in a list.
def in_(self, values: list[Any]) -> Conditionvalueslist[Any]
- not_in
-
Check if value is not in a list.
def not_in(self, values: list[Any]) -> Conditionvalueslist[Any]
- like
-
SQL LIKE pattern matching (case-sensitive).
def like(self, pattern: str) -> Conditionpatternstr
- not_like
-
SQL NOT LIKE pattern matching (case-sensitive).
def not_like(self, pattern: str) -> Conditionpatternstr
- ilike
-
PostgreSQL ILIKE pattern matching (case-insensitive).
Note: For SQLite and DuckDB, this will use LIKE with LOWER() for case-insensitivity.
def ilike(self, pattern: str) -> Conditionpatternstr
- not_ilike
-
PostgreSQL NOT ILIKE pattern matching (case-insensitive).
Note: For SQLite and DuckDB, this will use NOT LIKE with LOWER() for case-insensitivity.
def not_ilike(self, pattern: str) -> Conditionpatternstr
- is_null
-
Check if value is NULL.
def is_null(self) -> Condition - is_not_null
-
Check if value is not NULL.
def is_not_null(self) -> Condition - between
-
Check if value is between two values.
def between(self, low: Any, high: Any) -> ConditionlowAny-
Lower bound (inclusive). If None, raises ValueError.
highAny-
Upper bound (inclusive). If None, raises ValueError.
- not_between
-
Check if value is not between two values.
def not_between(self, low: Any, high: Any) -> ConditionlowAny-
Lower bound (inclusive). If None, raises ValueError.
highAny-
Upper bound (inclusive). If None, raises ValueError.
Condition
WHERE clause condition that can be combined with others.
class Condition(BaseModel)Attributes
leftstr | 'Condition' | None-
Column name (simple) or left operand (compound).
operatorOperator | LogicalOperator | None-
Comparison operator (simple) or logical operator (compound).
right'Condition' | list[ScalarValue] | tuple[ScalarValue, ScalarValue] | ScalarValue-
Comparison value (simple) or right operand (compound).
is_compoundbool-
True for AND/OR/NOT conditions, False for simple comparisons.
paramslist[ScalarValue]-
SQL parameters extracted from the condition for parameterized queries.
Columns
Entry point for building filter expressions.
Supports both dot notation and bracket notation for accessing columns:
from inspect_scout import columns as c
c.column_name
c["column_name"]
c["nested.json.path"]class ColumnsAttributes
transcript_idColumn-
Globally unique identifier for transcript.
source_typeColumn-
Type of transcript source (e.g. “eval_log”, “weave”, etc.).
source_idColumn-
Globally unique identifier of transcript source (e.g. ‘eval_id’ in Inspect logs).
source_uriColumn-
URI for source data (e.g. full path to the Inspect log file or weave op).
dateColumn-
Date transcript was created.
task_setColumn-
Set from which transcript task was drawn (e.g. benchmark name).
task_idColumn-
Identifier for task (e.g. dataset sample id).
task_repeatColumn-
Repeat for a given task id within a task set (e.g. epoch).
agentColumn-
Agent name.
agent_argsColumn-
Agent args.
modelColumn-
Model used for eval.
model_optionsColumn-
Generation options for model.
scoreColumn-
Headline score value.
successColumn-
Reduction of ‘score’ to True/False sucess.
message_countColumn-
Messages in conversation.
total_timeColumn-
Total execution time.
errorColumn-
Error that halted exeuction.
limitColumn-
Limit that halted execution.
columns
Column selector for where expressions.
Typically aliased to a more compact expression (e.g. c) for use in queries). For example:
from inspect_scout import columns as c
filter = c.model == "gpt-4"
filter = (c.task_set == "math") & (c.epochs > 1)columns = Columns()LogColumns
Typed column interface for Inspect log transcripts.
Provides typed properties for standard Inspect log columns while preserving the ability to access custom fields through the base Metadata class methods.
class LogColumns(Columns)Attributes
transcript_idColumn-
Globally unique identifier for transcript.
source_typeColumn-
Type of transcript source (e.g. “eval_log”, “weave”, etc.).
source_idColumn-
Globally unique identifier of transcript source (e.g. ‘eval_id’ in Inspect logs).
source_uriColumn-
URI for source data (e.g. full path to the Inspect log file or weave op).
dateColumn-
Date transcript was created.
task_setColumn-
Set from which transcript task was drawn (e.g. benchmark name).
task_idColumn-
Identifier for task (e.g. dataset sample id).
task_repeatColumn-
Repeat for a given task id within a task set (e.g. epoch).
agentColumn-
Agent name.
agent_argsColumn-
Agent args.
modelColumn-
Model used for eval.
model_optionsColumn-
Generation options for model.
scoreColumn-
Headline score value.
successColumn-
Reduction of ‘score’ to True/False sucess.
message_countColumn-
Messages in conversation.
total_timeColumn-
Total execution time.
errorColumn-
Error that halted exeuction.
limitColumn-
Limit that halted execution.
sample_idColumn-
Unique id for sample.
eval_idColumn-
Globally unique id for eval.
eval_statusColumn-
Status of eval.
logColumn-
Location that the log file was read from.
eval_tagsColumn-
Tags associated with evaluation run.
eval_metadataColumn-
Additional eval metadata.
task_argsColumn-
Task arguments.
generate_configColumn-
Generate config specified for model instance.
model_rolesColumn-
Model roles.
idColumn-
Unique id for sample.
epochColumn-
Epoch number for sample.
inputColumn-
Sample input.
targetColumn-
Sample target.
sample_metadataColumn-
Sample metadata.
working_timeColumn-
Time spent working (model generation, sandbox calls, etc.).
log_columns
Log columns selector for where expressions.
Typically aliased to a more compact expression (e.g. c) for use in queries). For example:
from inspect_scout import log_columns as c
# typed access to standard fields
filter = c.model == "gpt-4"
filter = (c.task_set == "math") & (c.epochs > 1)
# dynamic access to custom fields
filter = c["custom_field"] > 100log_columns = LogColumns()Messages
transcript_messages
Yield pre-rendered message segments from a transcript.
Automatically selects the best extraction strategy based on what data is available on the transcript:
- If timelines are present, delegates to timeline_messages()
- If events are present (no timelines), delegates to segment_messages() with compaction handling
- If only messages are present, delegates to segment_messages() for context window segmentation only
By default, scorer events are excluded from extraction. This applies to both the timeline path (the scorers span is pruned) and the events path (the scorers section is removed).
Since TimelineMessages is structurally compatible with MessagesSegment, callers get a uniform interface. Those needing span context can isinstance-check for TimelineMessages.
async def transcript_messages(
transcript: "Transcript",
*,
messages_as_str: MessagesAsStr,
model: Model | str | None = None,
context_window: int | None = None,
compaction: Literal["all", "last"] | int = "all",
depth: int | None = None,
include_scorers: bool = False,
) -> AsyncIterator[MessagesSegment]transcript'Transcript'-
The transcript to extract messages from.
messages_as_strMessagesAsStr-
Rendering function from message_numbering().
modelModel | str | None-
The model used for scanning.
context_windowint | None-
Override for the model’s context window size.
compactionLiteral['all', 'last'] | int-
How to handle compaction boundaries when extracting messages from events.
depthint | None-
Maximum depth of the span tree to process when timelines are present. Ignored for events-only or messages-only paths.
include_scorersbool-
Whether to include scorer events in message extraction. Defaults to
False.
timeline_messages
Yield pre-rendered message segments from timeline spans.
Walks the span tree, passes each non-utility span with direct ModelEvent content to segment_messages() for message extraction and context window segmentation. Each yielded item includes the span context alongside the pre-rendered text.
To filter which spans are processed, use filter_timeline() before calling this function.
async def timeline_messages(
timeline: Timeline | TimelineSpan,
*,
messages_as_str: MessagesAsStr,
model: Model | str | None = None,
context_window: int | None = None,
compaction: Literal["all", "last"] | int = "all",
depth: int | None = None,
) -> AsyncIterator[TimelineMessages]timelineTimeline | TimelineSpan-
The timeline (or a specific span subtree) to extract messages from. If a Timeline, starts from timeline.root.
messages_as_strMessagesAsStr-
Rendering function from message_numbering() that formats messages with globally unique IDs.
modelModel | str | None-
The model used for scanning. Provides count_tokens() for measuring rendered text.
context_windowint | None-
Override for the model’s context window size (in tokens). When None, looked up via get_model_info(). An 80% discount factor is applied to leave room for system prompts and scanning overhead.
compactionLiteral['all', 'last'] | int-
How to handle compaction boundaries when extracting messages from span events.
depthint | None-
Maximum depth of the span tree to process.
1processes only the root span,2includes immediate children, etc. None (default) recurses without limit.
segment_messages
Render messages and split them into segments that fit within a token budget.
Renders each message individually via messages_as_str, counts tokens in parallel via tg_collect, then accumulates segments that fit within the effective budget (context window * 80%).
When given events or a TimelineSpan, delegates to span_messages() to extract and merge messages (handling compaction boundaries), then segments the result.
async def segment_messages(
source: list[ChatMessage] | list[Event] | TimelineSpan,
*,
messages_as_str: MessagesAsStr,
model: Model | str | None = None,
context_window: int | None = None,
compaction: Literal["all", "last"] | int = "all",
) -> AsyncIterator[MessagesSegment]sourcelist[ChatMessage] | list[Event] | TimelineSpan-
A list of ChatMessage, a list of Event, or a TimelineSpan. Events and spans are processed via span_messages() first.
messages_as_strMessagesAsStr-
Rendering function from message_numbering(). Must be called sequentially to preserve counter ordering.
modelModel | str | None-
Model used for token counting.
context_windowint | None-
Override for context window size. If None, looked up via
get_model_info(model). compactionLiteral['all', 'last'] | int-
How to handle compaction boundaries when source contains events. Passed through to span_messages().
span_messages
Extract messages from a span or event list, handling compaction.
Filters for ModelEvent and CompactionEvent, then merges messages into a single list based on the compaction strategy.
def span_messages(
source: Timeline | TimelineSpan | list[Event],
*,
compaction: Literal["all", "last"] | int = "all",
split_compactions: bool = False,
) -> list[ChatMessage] | list[list[ChatMessage]]sourceTimeline | TimelineSpan | list[Event]-
A
Timeline(extracts.root),TimelineSpan(events extracted from its content), or a raw list of events. Non-Model/Compaction events are ignored. compactionLiteral['all', 'last'] | int-
How to handle compaction boundaries: -
"all": merge across boundaries for full coverage. Summary grafts pre + post messages. Trim prepends the trimmed prefix. Edit is transparent. -"last": ignore compaction history, return only the lastModelEvent’s input + output. -int: keep the last N compaction regions.1is equivalent to"last". If N exceeds the number of regions the result is the same as"all". split_compactionsbool-
When
True, return one inner list per compaction region instead of merging into a flat list. Thecompactionparameter still controls how many regions to keep before splitting.
MessagesSegment
A segment of rendered messages that fits within a token budget.
@dataclass(frozen=True)
class MessagesSegmentTimelineMessages
A segment of messages from a specific timeline span.
Structurally compatible with MessagesSegment (shares messages, messages_str, segment fields) with additional span context. Can be used anywhere a MessagesSegment is expected via duck typing.
@dataclass(frozen=True)
class TimelineMessagesObserve
observe
Observe decorator/context manager for transcript capture.
Works as decorator (@observe, @observe(), @observe(task_set=“x”)) or context manager (async with observe():).
Uses implicit leaf detection: the innermost observe context (one with no children) triggers transcript write to the database. This allows nesting observe contexts where the outer context sets shared parameters and inner contexts represent individual transcript entries.
def observe(
func: Callable[OP, Awaitable[OR]] | None = None,
*,
provider: (
Literal["inspect", "openai", "anthropic", "google"]
| ObserveProvider
| Sequence[ObserveProviderName | ObserveProvider]
| None
) = "inspect",
db: str | TranscriptsDB | None = None,
# TranscriptInfo fields (ordered to match TranscriptInfo for consistency)
source_type: str = "observe",
source_id: str | None = None,
source_uri: str | None = None,
task_set: str | None = None,
task_id: str | None = None,
task_repeat: int | None = None,
agent: str | None = None,
agent_args: dict[str, Any] | None = None,
model: str | None = None,
model_options: dict[str, Any] | None = None,
metadata: dict[str, Any] | None = None,
# Full TranscriptInfo for advanced use
info: TranscriptInfo | None = None,
) -> Callable[OP, Awaitable[OR]] | _ObserveContextManagerfuncCallable[OP, Awaitable[OR]] | None-
The async function to decorate (when used as @observe without parens).
providerLiteral['inspect', 'openai', 'anthropic', 'google'] | ObserveProvider | Sequence[ObserveProviderName | ObserveProvider] | None-
Provider(s) for capturing LLM calls. Can be a provider name (“inspect”, “openai”, “anthropic”, “google”), an ObserveProvider instance, or a sequence of either. Defaults to “inspect” which captures Inspect AI model.generate() calls. Use other providers to capture direct SDK calls. Can only be set on root observe.
dbstr | TranscriptsDB | None-
Transcript database or path for writing. Can be a TranscriptsDB instance or a string path (which will be passed to transcripts_db()). Only valid on outermost observe; defaults to project transcripts directory.
source_typestr-
Type of source for transcript. Defaults to “observe”.
source_idstr | None-
Globally unique ID for transcript source (e.g. eval_id).
source_uristr | None-
URI for source data (e.g. log file path).
task_setstr | None-
Set from which transcript task was drawn (e.g. benchmark name).
task_idstr | None-
Identifier for task (e.g. dataset sample id).
task_repeatint | None-
Repeat for a given task id within a task set (e.g. epoch).
agentstr | None-
Agent used to execute task.
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
metadatadict[str, Any] | None-
Transcript source specific metadata (merged with parent).
infoTranscriptInfo | None-
Full TranscriptInfo for advanced use (fields override parent, explicit args override info).
observe_update
Update the current observe context’s TranscriptInfo fields.
Call this from within an @observe decorated function or observe() context to set transcript fields after execution (e.g., score, success, limit).
def observe_update(
*,
source_type: str | None = None,
source_id: str | None = None,
source_uri: str | None = None,
task_set: str | None = None,
task_id: str | None = None,
task_repeat: int | None = None,
agent: str | None = None,
agent_args: dict[str, Any] | None = None,
model: str | None = None,
model_options: dict[str, Any] | None = None,
score: JsonValue | None = None,
success: bool | None = None,
limit: str | None = None,
metadata: dict[str, Any] | None = None,
) -> Nonesource_typestr | None-
Type of source for transcript.
source_idstr | None-
Globally unique ID for transcript source.
source_uristr | None-
URI for source data.
task_setstr | None-
Set from which transcript task was drawn.
task_idstr | None-
Identifier for task.
task_repeatint | None-
Repeat for a given task id within a task set.
agentstr | None-
Agent used to execute task.
agent_argsdict[str, Any] | None-
Arguments passed to create agent.
modelstr | None-
Main model used by agent.
model_optionsdict[str, Any] | None-
Generation options for main model.
scoreJsonValue | None-
Value indicating score on task.
successbool | None-
Boolean reduction of score to succeeded/failed.
limitstr | None-
Limit that caused the task to exit.
metadatadict[str, Any] | None-
Transcript source specific metadata (merged, not replaced).
ObserveProvider
Protocol for LLM capture providers.
@runtime_checkable
class ObserveProvider(Protocol)Methods
- install
-
Install hooks/patches for capturing LLM calls.
Called once per provider class. Implementations should be idempotent.
def install(self, emit: ObserveEmit) -> NoneemitObserveEmit-
Sync callback to emit raw captured data. Call with a dict containing request/response data. Framework handles context checking - emit() is a no-op if not inside an observe context. The dict structure is provider-defined and passed to build_event().
- build_event
-
Convert raw captured data to an Inspect Event.
Called by the framework at observe exit for each captured item. This is where async conversion (using Inspect AI converters) happens.
async def build_event(self, data: dict[str, Any]) -> Eventdatadict[str, Any]-
The dict passed to emit() during capture.
ObserveEmit
Sync function to emit raw captured data. Called by provider wrappers.
ObserveEmit = Callable[[dict[str, Any]], None]