from inspect_viz import Data
from inspect_viz.view.beta import sample_heatmap
samples = Data.from_file("writing_bench_samples.parquet")
sample_heatmap(
samples,
score_value="score_multi_scorer_wrapper",
height=200
)Sample Heatmap
Overview
The sample_heatmap()function renders a heatmap for viewing individual sample scores. Each sample is shown along one axis, models are shown along the other axis.
Data Preparation
Above we read the data for the plot from a parquet file. This file was in turn created by:
Reading logs into a dataframe with
samples_df, specifying the SampleSummaryColumns and the EvaalModel columns.Using the
prepare()function tomodel_info()andlog_viewer()columns to the data frame.
from inspect_ai.analysis import samples_df, log_viewer, model_info, prepare, EvalModel, SampleSummary
df = samples_df("logs", columns=SampleSummary + EvalModel)
df = prepare(df, [
model_info(),
log_viewer("sample", {"logs": "https://samples.meridianlabs.ai/"})
])
df.to_parquet("writing_bench_samples.parquet")If your samples are using non-numeric values (for example ‘C’ and ‘I’ to represent correct and incorrect), you can use the score_to_float prepare function in inspect.analysis to convert the score values to floats. For example:
from inspect_ai.analysis import samples_df, log_viewer, model_info, score_to_float, prepare, EvalModel, SampleSummary
df = samples_df("logs", columns=SampleSummary + EvalModel)
df = prepare(df, [
model_info(),
log_viewer("sample", {"logs": "https://samples.meridianlabs.ai/"}),
score_to_float("score_includes")
])Note that both the log viewer links and model names are optional (the plot will render without links and use raw model strings if the data isn’t prepared with log_viewer() and model_info()).
Function Reference
Creates a heatmap plot of success rate of eval data.
def sample_heatmap(
data: Data,
id: str = "id",
id_label: str | None | NotGiven = None,
model_name: str = "model_display_name",
model_label: str | None | NotGiven = None,
score_value: str | None = None,
cell: CellOptions | None = None,
tip: bool = True,
title: str | Title | None = None,
marks: Marks | None = None,
height: float | None = None,
width: float | None = None,
legend: Legend | NotGiven | None = NOT_GIVEN,
sort: Literal["ascending", "descending"] | SortOrder | None = "ascending",
orientation: Literal["horizontal", "vertical"] = "horizontal",
**attributes: Unpack[PlotAttributes],
) -> ComponentdataData-
Evals data table.
idstr-
Name of column to use for displaying the sample id.
id_labelstr | None | NotGiven-
x-axis label (defaults to None).
model_namestr-
Name of column to use for rows.
model_labelstr | None | NotGiven-
y-axis label (defaults to None).
score_valuestr | None-
Name of the column to use as values to determine cell color.
cellCellOptions | None-
Options for the cell marks.
tipbool-
Whether to show a tooltip with the value when hovering over a cell (defaults to True).
titlestr | Title | None-
Title for plot (
stror mark created with the title() function) marksMarks | None-
Additional marks to include in the plot.
heightfloat | None-
The outer height of the plot in pixels, including margins. The default is width / 1.618 (the golden ratio).
widthfloat | None-
The outer width of the plot in pixels, including margins. Defaults to 700.
legendLegend | NotGiven | None-
Options for the legend. Pass None to disable the legend.
sortLiteral['ascending', 'descending'] | SortOrder | None-
Sort order for the x and y axes. If ascending, the highest values will be sorted to the top right. If descending, the highest values will appear in the bottom left. If None, no sorting is applied. If a SortOrder is provided, it will be used to sort the x and y axes.
orientationLiteral['horizontal', 'vertical']-
The orientation of the heatmap. If “horizontal”, the tasks will be on the x-axis and models on the y-axis. If “vertical”, the tasks will be on the y-axis and models on the x-axis.
**attributesUnpack[PlotAttributes]-
Additional `PlotAttributes
Implementation
The Sample Heatmap example demonstrates how this view was implemented using lower level plotting components.