Inspect Viz
Data visualization for Inspect AI large language model evalutions.
Welcome
Welcome to Inspect Viz, a data visualisation library for Inspect AI. Inspect Viz provides flexible tools for creating high quality interactive visualisations from Inspect evaluations.
Here’s an Inspect Viz plot created with the scores_timeline() function that compares benchmarks scores over time1:
Use the filters to switch benchmarks and restrict to models from various organization(s). Hover over the points to get additional details on them or view the underlying Inspect log for the evals.
Installation
First, install the inspect_viz
package from GitHub as follows:
pip install git+https://github.com/meridianlabs-ai/inspect_viz
Inspect Viz plots are interactive Jupyter Widgets and can be authored in variety of ways:
In any Jupyter Notebook (JupyterLab, VS Code, Colab, etc.)
In VS Code with the Jupyter: Run Current File in Interactive Window command.
In VS Code within a Quarto executable markdown document.
See the article on LLM Assistance for best practices on using language models to help with creating plots. See the articles on Publishing for details on including plots in documents as static images or within websites and dashboards as interactive widgets.
Views
Inspect Viz Views are pre-built plots that work with data created by the Inspect log data frame reading functions. For example, the scores_by_factor() view enables you to compare scores across models and a boolean factor:
from inspect_viz import Data
from inspect_viz.view.beta import scores_by_factor
= Data.from_file("evals-hint.parquet")
evals "task_arg_hint", ("No hint", "Hint")) scores_by_factor(evals,
The tool_calls() view enables you to visualize tool calls by sample:
from inspect_viz.view.beta import tool_calls
= Data.from_file("cybench_tools.parquet")
tools tool_calls(tools)
Available views include:
View | Description |
---|---|
scores_by_task() | Bar plot for comparing eval scores (with confidence intervals) across models and tasks. |
scores_by_factor() | Bar bar plot for comparing eval scores by model and a boolean factor (e.g. no hint vs. hint). |
scores_by_limit() | Line plot showing success rate by token limit. |
scores_timeline() | Scatter plot with eval scores by model, organization, and release date. Filterable by evaluation and organization. |
scores_heatmap() | Heatmap with values for comparing scores across model and task. |
scores_by_model() | Bar plot for comparing model scores on a single eval. |
tool_calls() | Heat map visualising tool calls over evaluation turns. |
Plots
While pre-built views are useful, you also may want to create your own custom plots. Plots in inspect_viz
are composed of one or more marks, which can do either higher level plotting (e.g. dot(), bar_x(), bar_y(), area(), heatmap(), etc.) or lower level drawing on tπhe plot canvas (e.g. text(), image(), arrow(), etc.)
Dot Plot
Here is an example of a simple dot plot using a dataset of GPQA Diamond results:
from inspect_viz import Data
from inspect_viz.plot import plot
from inspect_viz.mark import dot
= Data.from_file("gpqa.parquet")
gpqa
plot(
dot(
gpqa, ="model_release_date",
x="score_headline_value",
y="model_organization_name",
fill= {
channels"Model": "model_display_name",
"Score": "score_headline_value",
"Stderr": "score_headline_stderr",
}
),="GPQA Diamond",
title="color",
legend=True,
grid="Release Date",
x_label="Score",
y_label=[0,1.0],
y_domain )
- 1
-
Read the dataset from a parquet file. You can can also use
Data.from_dataframe()
to read data from any Pandas, Polars, or PyArrow data frame. - 2
- Plot using a dot() mark. The plot() function takes one or more marks or interactors.
- 3
-
Map the “model_organization_name” column to the
fill
scale of the plot (causing each orgnization to have its own color). - 4
- Show tooltip with defined channels.
- 5
-
Add a
legend
to the plot as a key to our color mappings. - 6
- Ensure that the y-axis goes from 0 to 1.
Bar Plot
Here is a simple horizontal bar plot that counts the number of each species:
from inspect_viz.mark import bar_x
= Data.from_file("agi-lsat-ar.parquet")
evals
plot(
bar_x(
evals, ="score_headline_value",
x="model_display_name",
y={"y": "x", "reverse": True},
sort="#3266ae"
fill
),="AR-LSAT",
title="Score",
x_label=None,
y_label=120.
margin_left )
- 1
- Sort the bars by score (descending).
- 2
- Y-axis is labeled with model names so remove default label and ensure it has enough margin.
Links
Inspect Viz supports creating direct links from visualizations to published Inspect log transcripts. Links can be made at the eval level, or to individual samples, messages, or events. For example, this plot produced with scores_by_model() includes a link to the underlying logs in its tooltips:
from inspect_viz.view.beta import scores_by_model
# baseline=0.91 scores_by_model(evals)
The pre-built Views all support linking when a log_viewer
column is available in the dataset. To learn more about ammending datasets with viewer URLs as well as adding linking support to your own plots see the article on Links.
Filters
Use inputs to enable filtering datasets and dynamically updating plots. For example, if we had multiple benchmarks available for a scores timeline, we could add a select() input for choosing between them:
from inspect_viz.input import select
from inspect_viz.layout import vconcat
= Data.from_file("benchmarks.parquet")
benchmarks
vconcat(
select(
benchmarks, ="Benchmark",
label="task_name",
column="auto"
value
),
plot(
dot(
benchmarks, ="model_release_date",
x="score_headline_value",
y="model_organization_name",
fill
),="color",
legend=True,
grid="Release Date",
x_label="Score",
y_label=[0,1.0],
y_domain="fixed"
color_domain
) )
We’ve introduced a few new things here:
The vconcat() function from the layout module lets us stack inputs on top of our plot.
The select() function from the input module binds a select box to the
task_name
column.The
color_domain="fixed"
argument to plot() indicates that we want to preserve model organization colors even when the plot is filtered.
Marks
So far the plots we’ve created include only a single mark, however many of the more interesting plots you’ll create will include multiple marks.
For example, here we create a heatmap of evaluation scores by model. There is a cell() mark which provides the heatmap background color and a text() mark which displays the value.
from inspect_viz.mark import cell, text
= Data.from_file("scores.parquet")
scores
plot(
cell(
scores, ="task_name",
x="model",
y="score_headline_value",
fill
),
text(
scores,="task_name",
x="model",
y="score_headline_value",
text="white",
fill
),=0,
padding="viridis",
color_scheme=250,
height=150,
margin_left=None,
x_label=None
y_label )
Marks can be used to draw dots, lines, bars, cells, arrows, text, and images on a plot.
Data
In the examples above we made Data available by reading from a parquet file. We can also read data from any Python Data Frame (e.g. Pandas, Polars, PyArrow, etc.). For example:
import pandas as pd
from inspect_viz import Data
# read directly from file
= Data.from_file("penguins.parquet")
penguins
# read from Pandas DF (i.e. to preprocess first)
= pd.read_parquet("penguins.parquet")
df = Data.from_dataframe(df) penguins
You might wonder why is there a special Data class in Inspect Viz rather than using data frames directly? This is because Inpsect Viz is an interactive system where data can be dynamically filtered and transformed as part of plotting—the Data therefore needs to be sent to the web browser rather than remaining only in the Python session. This has a couple of important implications:
Data transformations should be done using standard Python Data Frame operations prior to reading into Data for Inspect Viz.
Since Data is embedded in the web page, you will want to filter it down to only the columns required for plotting (as you don’t want the additional columns making the web page larger than is necessary).
Selections
One other important thing to understand is that Data has a built in selection which is used in filtering operations on the client. This means that if you want your inputs and plots to stay synchoronized, you should pass the same Data instance to all of them (i.e. import into Data once and then share that reference). For example:
from inspect_viz import Data
from inspect_viz.plot import plot
from inspect_viz.mark import dot
from inspect_viz.input import select
from inspect_viz.layout import vconcat
# we import penguins once and then pass it to select() and dot()
= Data.from_file("penguins.parquet")
penguins
vconcat( ="Species", column="species"),
select(penguins, label
plot(="body_mass", y="flipper_length",
dot(penguins, x="species", symbol="species"),
stroke="symbol",
legend="fixed"
color_domain
) )
Tables
You can also display data in a tabular layout using the table() function:
from inspect_viz.table import column, table
= Data.from_file("benchmarks.parquet")
benchmarks
table(
benchmarks, =[
columns"model_organization_name", label="Organization"),
column("model_display_name", label="Model"),
column("model_release_date", label="Release Date"),
column("score_headline_value", label="Score", width=100),
column("score_headline_stderr", label="StdErr", width=100),
column(
] )
You can sort and filter tables by column, use a scrolling or paginated display, and customize several other aspects of table appearance and behavior.
Learning More
Use these resources to learn more about using Inspect Viz:
Views describes the various available pre-built views and how to customize them.
Plots goes into further depth on plotting options and how to create custom plots.
Articles on Marks, Links, Tables, Inputs, and Interactivity explore other components commonly used in visualizations.
Publishing covers publishing Inspect Viz content as standalone plots, notebooks, websites, and dashboards.
Reference provides details on the available marks, interactors, transforms, and inputs.
Examples demonstrates more advanced plotting and interactivity features.
Footnotes
Citation
@software{Meridan_Labs_Inspect_Viz_2025,
author = {Labs, Meridian},
title = {Inspect {Viz:} {Data} {Visualization} for {Inspect} {AI}
{Large} {Language} {Model} {Evalutions}},
date = {2025-08},
url = {https://github.com/meridianlabs-ai/inspect_viz},
langid = {en}
}