from inspect_viz import Data
from inspect_viz.view.beta import sample_tool_calls
tools = Data.from_file("cybench_tools.parquet")
sample_tool_calls(tools)Sample Tool Calls
Overview
The sample_tool_calls() function creates a heat map visualising tool calls over evaluation turns for each sample.
Data Preparation
To create the plot we read a raw messages data frame from an eval log using the messages_df() function, then filter down to just the fields we require for visualization:
from inspect_ai.analysis import messages_df, log_viewer, model_info, prepare, EvalModel, MessageColumns, SampleSummary
# read messages from log
log = "<path-to-log>.eval"
# Be sure to add EvalModel column so links can be prepared
df = messages_df(log, columns=EvalModel + SampleSummary + MessageColumns)
# trim columns
df = df[[
"eval_id",
"sample_id",
"message_id",
"model",
"id",
"order",
"tool_call_function",
"limit",
"log"
]]
# prepare the data frame with model info and log links
df = prepare(df, [
model_info(),
log_viewer("message", url_mappings={
"logs": "https://samples.meridianlabs.ai/"
})
])
# write to parquet
df.to_parquet("cybench_tools.parquet")Note that the trimming of columns is particularly important because Inspect Viz embeds datasets directly in the web pages that host them (so we want to minimize their size for page load performance and bandwidth usage).
Function Reference
Heat map visualising tool calls over evaluation turns.
def sample_tool_calls(
data: Data,
x: str = "order",
y: str = "id",
tool: str = "tool_call_function",
limit: str = "limit",
tools: list[str] | None = None,
x_label: str | None = "Message",
y_label: str | None = "Sample",
title: str | Title | None = None,
marks: Marks | None = None,
width: float | None = None,
height: float | None = None,
legend: Legend | NotGiven | None = NOT_GIVEN,
**attributes: Unpack[PlotAttributes],
) -> ComponentdataData-
Messages data table. This is typically created using a data frame read with the inspect
messages_df()function. xstr-
Name of field for x axis (defaults to “order”)
ystr-
Name of field for y axis (defaults to “id”).
toolstr-
Name of field with tool name (defaults to “tool_call_function”)
limitstr-
Name of field with sample limit (defaults to “limit”).
toolslist[str] | None-
Tools to include in plot (and order to include them). Defaults to all tools found in
data. x_labelstr | None-
x-axis label (defaults to “Message”).
y_labelstr | None-
y-axis label (defaults to “Sample”).
titlestr | Title | None-
Title for plot (
stror mark created with the title() function) marksMarks | None-
Additional marks to include in the plot.
widthfloat | None-
The outer width of the plot in pixels, including margins. Defaults to 700.
heightfloat | None-
The outer height of the plot in pixels, including margins. The default is width / 1.618 (the golden ratio)
legendLegend | NotGiven | None-
Options for the legend. Pass None to disable the legend.
**attributesUnpack[PlotAttributes]-
Additional PlotAttributes. By default, the
margin_topis set to 0,margin_leftto 20,margin_rightto 100,color_labelis “Tool”,y_ticksis empty, andx_ticksandcolor_domainare calculated fromdata.
Implementation
The Sample Tool Calls example demonstrates how this view was implemented using lower level plotting components.