from inspect_viz import Data
from inspect_viz.view.beta import scores_by_model
from inspect_viz.mark import baseline
= Data.from_file("agi-lsat-ar.parquet")
evals =baseline(0.697, label="Human")) scores_by_model(evals, marks
Scores by Model
Overview
The scores_by_model() function creates a horizontal bar plot for comparing the scores of different models on a single evaluation, with one or more baselines overlaid as vertical lines.
Data Preparation
Above we read the data for the plot from a parquet file. This file was in turn created by:
Reading logs into a data frame with
evals_df()
.Using the
prepare()
function to addmodel_info()
andlog_viewer()
columns to the data frame.
from inspect_ai.analysis import evals_df, log_viewer, model_into, prepare
= evals_df("logs")
df = prepare(df,
df
model_info(),"eval", {"logs": "https://samples.meridianlabs.ai/"}),
log_viewer(
)"agi-lsat-ar.parquet") df.to_parquet(
You can additionally use the task_info()
operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -> “GPQA Diamond”).
Note that both the log viewer links and model names are optional (the plot will render without links and use raw model strings if the data isn’t prepared with log_viewer()
and model_info()
).
Function Reference
Bar plot for comparing the scores of different models on a single evaluation.
Summarize eval scores using a bar plot. By default, scores (y
) are plotted by “model_display_name” (y
). By default, confidence intervals are also plotted (disable this with y_ci=False
).
def scores_by_model(
data: Data,*,
str = "model_display_name",
model_name: str = "score_headline_value",
score_value: str = "score_headline_stderr",
score_stderr: float = 0.95,
ci: "asc", "desc"] | None = None,
sort: Literal[str | None | NotGiven = None,
score_label: str | None | NotGiven = None,
model_label: str | None = None,
color: str | Title | None = None,
title: | None = None,
marks: Marks float | None = None,
width: float | None = None,
height: **attributes: Unpack[PlotAttributes],
-> Component )
data
Data-
Evals data table. This is typically created using a data frame read with the inspect
evals_df()
function. model_name
str-
Column containing the model name (defaults to “model_display_name”)
score_value
str-
Column containing the score value (defaults to “score_headline_value”).
score_stderr
str-
Column containing the score standard error (defaults to “score_headline_stderr”).
ci
float-
Confidence interval (e.g. 0.80, 0.90, 0.95, etc.). Defaults to 0.95.
sort
Literal['asc', 'desc'] | None-
Sort order for the bars (sorts using the ‘x’ value). Can be “asc” or “desc”. Defaults to “asc”.
score_label
str | None | NotGiven-
x-axis label (defaults to None).
model_label
str | None | NotGiven-
x-axis label (defaults to None).
color
str | None-
The color for the bars. Defaults to “#416AD0”. Pass any valid hex color value.
title
str | Title | None-
Title for plot (
str
or mark created with the title() function) marks
Marks | None-
Additional marks to include in the plot.
width
float | None-
The outer width of the plot in pixels, including margins. Defaults to 700.
height
float | None-
The outer height of the plot in pixels, including margins. The default is width / 1.618 (the golden ratio)
**attributes
Unpack[PlotAttributes]-
Additional PlotAttributes. By default, the
y_inset_top
andmargin_bottom
are set to 10 pixels andx_ticks
is set to[]
.
Implementation
The Scores by Model example demonstrates how this view was implemented using lower level plotting components.