from inspect_viz import Data
from inspect_viz.view.beta import scores_by_factor
= Data.from_file("evals-hint.parquet")
evals "task_arg_hint", ("No hint", "Hint")) scores_by_factor(evals,
Scores by Factor
Overview
The scores_by_factor() function renders a bar plot for comparing eval scores by model and a boolean factor (e.g. non-reasoning vs. reasoning, no hint vs. hint, etc.).
Data Preparation
Above we read the data for the plot from a parquet file. This file was in turn created by:
Reading logs into a data frame with
evals_df()
.Using the
prepare()
function to addmodel_info()
andlog_viewer()
columns to the data frame.
from inspect_ai.analysis import evals_df, log_viewer, model_into, prepare
= evals_df("logs")
df = prepare(df,
df
model_info(),"eval", {"logs": "https://samples.meridianlabs.ai/"}),
log_viewer(
)"evals-hint.parquet") df.to_parquet(
You can additionally use the task_info()
operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -> “GPQA Diamond”).
You should also ensure that your evals data frame has a boolean field corresponding to the factor you are splitting on (in the example above this is “task_arg_hint”).
Function Reference
Summarize eval scores with a factor of variation (e.g ‘No hint’ vs. ‘Hint’).
def scores_by_factor(
data: Data,str,
factor: tuple[str, str],
factor_labels: str = "score_headline_value",
score_value: str = "score_headline_stderr",
score_stderr: str = "Score",
score_label: str = "model",
model: str = "Model",
model_label: bool | float = 0.95,
ci: str | tuple[str, str] = "#3266ae",
color: str | Mark | None = None,
title: | None = None,
marks: Marks float | Param | None = None,
width: float | Param | None = None,
height: **attributes: Unpack[PlotAttributes],
-> Component )
data
Data-
Evals data table. This is typically created using a data frame read with the inspect
evals_df()
function. factor
str-
Field with factor of variation (should be of type boolean).
factor_labels
tuple[str, str]-
Tuple of labels for factor of variation.
False
value should be first, e.g.("No hint", "Hint")
. score_value
str-
Name of field for x (scoring) axis (defaults to “score_headline_value”).
score_stderr
str-
Name of field for scoring stderr (defaults to “score_headline_stderr”).
score_label
str-
Label for x-axis (defaults to “Score”).
model
str-
Name of field for y axis (defaults to “model”).
model_label
str-
Lable for y axis (defaults to “Model”).
ci
bool | float-
Confidence interval (e.g. 0.80, 0.90, 0.95, etc.). Defaults to 0.95.)
color
str | tuple[str, str]-
Hex color value (or tuple of two values). If one value is provided the second is computed by lightening the main color.
title
str | Mark | None-
Title for plot (
str
or mark created with the title() function). marks
Marks | None-
Additional marks to include in the plot.
width
float | Param | None-
The outer width of the plot in pixels, including margins. Defaults to 700.
height
float | Param | None-
The outer height of the plot in pixels, including margins. Default to 65 pixels for each item on the “y” axis.
**attributes
Unpack[PlotAttributes]-
Additional `PlotAttributes
Implementation
The Scores by Factor example demonstrates how this view was implemented using lower level plotting components.