from inspect_viz import Data
from inspect_viz.view.beta import scores_by_factor
evals = Data.from_file("evals-hint.parquet")
scores_by_factor(evals, "task_arg_hint", ("No hint", "Hint"))Scores by Factor
Overview
The scores_by_factor() function renders a bar plot for comparing eval scores by model and a boolean factor (e.g. non-reasoning vs. reasoning, no hint vs. hint, etc.).
Data Preparation
Above we read the data for the plot from a parquet file. This file was in turn created by:
Reading logs into a data frame with
evals_df().Using the
prepare()function to addmodel_info()andlog_viewer()columns to the data frame.
from inspect_ai.analysis import evals_df, log_viewer, model_info, prepare
df = evals_df("logs")
df = prepare(df, [
model_info(),
log_viewer("eval", {"logs": "https://samples.meridianlabs.ai/"})
])
df.to_parquet("evals-hint.parquet")You can additionally use the task_info() operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -> “GPQA Diamond”).
You should also ensure that your evals data frame has a boolean field corresponding to the factor you are splitting on (in the example above this is “task_arg_hint”).
Function Reference
Summarize eval scores with a factor of variation (e.g ‘No hint’ vs. ‘Hint’).
def scores_by_factor(
data: Data,
factor: str,
factor_labels: tuple[str, str],
score_value: str = "score_headline_value",
score_stderr: str = "score_headline_stderr",
score_label: str = "Score",
model: str = "model",
model_label: str = "Model",
ci: bool | float = 0.95,
color: str | tuple[str, str] = "#3266ae",
title: str | Mark | None = None,
marks: Marks | None = None,
width: float | Param | None = None,
height: float | Param | None = None,
legend: Legend | NotGiven | None = NOT_GIVEN,
**attributes: Unpack[PlotAttributes],
) -> ComponentdataData-
Evals data table. This is typically created using a data frame read with the inspect
evals_df()function. factorstr-
Field with factor of variation (should be of type boolean).
factor_labelstuple[str, str]-
Tuple of labels for factor of variation.
Falsevalue should be first, e.g.("No hint", "Hint"). score_valuestr-
Name of field for x (scoring) axis (defaults to “score_headline_value”).
score_stderrstr-
Name of field for scoring stderr (defaults to “score_headline_stderr”).
score_labelstr-
Label for x-axis (defaults to “Score”).
modelstr-
Name of field for y axis (defaults to “model”).
model_labelstr-
Lable for y axis (defaults to “Model”).
cibool | float-
Confidence interval (e.g. 0.80, 0.90, 0.95, etc.). Defaults to 0.95.)
colorstr | tuple[str, str]-
Hex color value (or tuple of two values). If one value is provided the second is computed by lightening the main color.
titlestr | Mark | None-
Title for plot (
stror mark created with the title() function). marksMarks | None-
Additional marks to include in the plot.
widthfloat | Param | None-
The outer width of the plot in pixels, including margins. Defaults to 700.
heightfloat | Param | None-
The outer height of the plot in pixels, including margins. Default to 65 pixels for each item on the “y” axis.
legendLegend | NotGiven | None-
Options for the legend. Pass None to disable the legend.
**attributesUnpack[PlotAttributes]-
Additional `PlotAttributes
Implementation
The Scores by Factor example demonstrates how this view was implemented using lower level plotting components.