Chunk token counting in segment_messages so large transcripts don’t flood the connection pool.
Bugfix: Preserve the caller’s protocol (e.g. file://) in scan locations returned by scan listing (regression in v0.4.36 that broke relative folder display in the VS Code scans panel).
0.4.39 (02 June 2026)
Transcript: Resolve ModelEvent input references from the consolidated events_data message and call pools.
Compatibility with Pandas 3.0 and DuckDB 1.5.
0.4.38 (25 May 2026)
Update inspect-ai dependency to 0.3.226
Update inspect-swe dependency to 0.2.56
Bugfix: Correct resolution of relative paths in transcript database index.
0.4.37 (22 May 2026)
Observe: OpenAI Chat Completions and Responses API calls that fail before any usable response now produce a ModelEvent instead of being dropped.
Structured generate: Warn model for orphan tool_use in structured_generate retry loop.
Enumerate all CSV, YAML, and JSON files when looking for validation sets (don’t skip gitignored files).
LLM Scanner / messages API: Sizing segments now subtracts the rendered scanner template’s tokens from the per-segment budget, so long templates no longer push the prompt past context_window.
Bugfix: Serialize metadata flds with pydantic_core.to_json
Include task_id and task_repeat in validations data files (as optional informational fields)
Scout View: Refine scanner result header and All Scores dialog
Scout View: Don’t allow malformed metadata to blow up entire scan (#245)
Scout View: Refine scanner result header and All Scores dialog (#243)
Scout View: Stop polling scans list (#234)
Scout View: Redesign MetaDataGrid with section cards and striping (#228)
Scout View: Prevent horizontal scroll on messages tab (#225)
Scout View: Scout scan UI fixes and negative filter (#226)
Scout View: Refine scanner result header and All Scores dialog
Scout View: Support custom role labels for transcript message rendering.
Bugfix: Stop doing blocking S3 file I/O on hot paths.
LLM Scanner: generate_answer() and structured_generate() accept context_tools — additional ToolInfo definitions declared in the request but never invoked (tool_choice forces the answer tool for structured answers and is "none" for textual answers). This lets callers pass a prompt containing prior tool_use blocks (e.g. when asking a follow-up question about an existing transcript) without the API rejecting the request for referencing undeclared tools.
Transcript: Add span_tools() alongside span_messages() to extract the union of ToolInfo definitions (deduped by name, last-seen wins) from a Timeline/TimelineSpan/list[Event]. Useful for callers that replay or interrogate a recorded conversation and need to re-declare the tools that were in scope.
Scan results: Fix TypeError: int() argument must be ... not 'NAType' raised by _expand_resultset_rows when aligning an all-NA pyarrow column to a non-nullable numpy numeric dtype. Falls back to object dtype when the target dtype cannot hold NA.
LLM Scanner: On timeline scans, reduce within-span chunks to one Result per span before wrapping in a resultset; custom reducers now fire on chunked spans (#431).
LLM Scanner: ResultReducer.llm() now uses the synthesizer’s own reasoning as the reduced Result.explanation, instead of a [Segment N] concat of the per-chunk explanations. The synthesis prompt also instructs the model to preserve [M1]-style message citations so the reduced explanation stays traceable to the merged references.
0.4.34 (12 May 2026)
Refactor scan panel display for use in Inspect task display.
0.4.33 (11 May 2026)
Scanning: Expose recorder primitives for inspect_ai eval_set scanner integration.
Observe: Stream-capture wrappers now emit the accumulated partial response when the underlying SDK stream raises mid-iteration (e.g. overloaded_error, connection reset, content-filter error SSE event). The resulting ModelEvent carries the partial output with error set; OpenAI mid-stream errors whose code indicates moderation (invalid_prompt, content_policy_violation, content_filter, cyber_policy) are mapped to stop_reason="content_filter".
Scout view: Dark mode in scout viewer (#165)
Scout view: Improve narrow-width viewing (#209)
Scout view: Outline: align close icon with title and make rootHeader sticky (#197)
Scout view: fix model retry event ordering (#189)
0.4.32 (06 May 2026)
LLM Scanner: Redefine depth semantics for timeline scanning. depth now counts levels of scannable spans (top-level agents/solvers and their scannable descendants).
0.4.29 (05 May 2026)
Transcripts: Treat directories with .eval files as an eval log collection (presence of .parquet no longer prevails).
Dependencies: Sync to semaphore changes in Inspect v0.3.217.
Bugfix: Exit non-zero when --fail-on-error and scan does not complete.
Bugfix: Always reset text progress indicator when job completes.
0.4.28 (29 April 2026)
LLM Scanner: Send default-template prompts as two content blocks (preamble + transcript / per-scanner tail) with cache_prompt=True. On Anthropic, the shared-prefix block is marked for caching so multiple scanners on the same transcript share a cache entry.
LLM Scanner: template parameter accepts a tuple[str, str] of (prefix, suffix) to opt custom templates into the same two-block cache-aware rendering. Passing a single str keeps the legacy single-block behavior unchanged.
Scanners: Run the first scanner alone for each transcript; release the rest only after it completes. Lets the lead’s generate populate the prompt cache so followers hit the warm cache.
Scanner as Scorer: When Result.value is None but the result includes an answer, explanation, or metadata, return a NOANSWER score that preserves those fields instead of dropping the score entirely.
generate_answer() / structured_generate(): add config: GenerateConfig | None parameter for per-call overrides (e.g. cache), so callers no longer need to mutate or copy the role Model to set generation options. parallel_tool_calls remains forced to False for structured answers.
Scout View: Improve message collapse behavior.
Scout View: Fade out bottom of truncated expandable panels to indicate more content below.
Scout View: Fix timeline error markers incorrectly flagging every model and tool event.
Scout View: Store task list filter/sort state independently per scope (Tasks vs Folders, individual folders).
Scout View: Add Tokens and Duration columns to samples list.
Scanner Tools: ResultReducer for reducing results from multiple transcript segments into single results, with built-in majority and LLM-based reducers.
Transcript DB: claude_code() source for importing transcripts from Claude Code session logs. Supports filtering by project, session, and time range, session merging, and image extraction.
CLI: scout import command for importing transcripts from registered sources into Scout projects.
Serialization: Use pa.large_string for string types to support larger column/file sizes.
Multiprocessing: Improve handling of model instances with multiprocessing serialization.
Transcripts: unthin target and and add scores from sample JSON.
Transcripts: Set row group size to 25 (specify as rows not bytes).
Transcripts: Address DuckDB 1.5 compatibility issue w/ mixed type CASE expressions.
Transcripts: Switch over to async ZIP modules (async_zip, zip_common, compression, compression_transcoding, async_bytes_reader) that have migrated to inspect_ai.
View Server: Add optimized /transcripts/{dir}/{id}/info and /transcripts/{dir}/{id}/messages-events endpoints for fetching transcript data. The messages-events endpoint streams raw (potentially compressed) JSON for improved performance.
Bugfix: Eliminate problem with stale transcript status when deleting validation cases.
0.4.11 (29 January 2026)
Projects: Always read scout.local.yaml even if there is no scout.yaml file.
Projects: Always apply project level filter to scans (AND combine with scan filters).
Validation: Label validation is now binary: validate true if the label is present with a truthy value; validate false if the label is not present or has only falsey values.
Scan Results: Add exclude_columns parameter for reading parquet reuslts to optionally reduce memory usage.
Scan Results: Pre-fetch optimization for S3/remote parquet files.
Transcript DB: observe() decorator/context manager for writing transcripts based on observed LLM generations.
Transcript DB: langsmith() and logfire() transcript sources for importing transcripts from LLM observability systems.
0.4.10 (21 January 2026)
Scanning: Implement significant scanning performance improvement when scanning eval logs and events are unneeded.
Scan config: Set ‘model’ to None if no model is specified.
Scan config: Deprecate use of environment variables for config (in favor of project config).
Scout View: Move ‘Project’ UI button to main activity bar.
0.4.9 (20 January 2026)
Bugfix: Don’t check index coverage when running with an active limit or other query filter.