# Inspect Viz ## Welcome Welcome to Inspect Viz, a data visualisation library for [Inspect AI](https://inspect.aisi.org.uk/). Inspect Viz provides flexible tools for creating high quality interactive visualisations from Inspect evaluations. Here’s an Inspect Viz plot created with the [`scores_timeline()`](view-scores-timeline.qmd) function that compares benchmarks scores over time[^1]: Use the filters to switch benchmarks and restrict to models from various organization(s). Hover over the points to get additional details on them or view the underlying Inspect log for the evals. ## Installation First, install the `inspect_viz` package from GitHub as follows: ``` bash pip install git+https://github.com/meridianlabs-ai/inspect_viz ``` Inspect Viz plots are interactive Jupyter Widgets and can be authored in variety of ways: 1. In any [Jupyter Notebook](https://jupyter.org/) (JupyterLab, VS Code, Colab, etc.) 2. In VS Code with the **Jupyter: Run Current File in Interactive Window** command. 3. In VS Code within a [Quarto](https://quarto.org) executable markdown document. See the article on [LLM Assistance](llm-assistance.qmd) for best practices on using language models to help with creating plots. See the articles on [Publishing](publishing.qmd) for details on including plots in documents as static images or within websites and dashboards as interactive widgets. ## Views Inspect Viz [Views](views.qmd) are pre-built plots that work with data created by the Inspect log [data frame](https://inspect.aisi.org.uk/dataframe.html) reading functions. For example, the [`scores_by_factor()`](view-scores-by-factor.qmd) view enables you to compare scores across models and a boolean factor: ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_by_factor evals = Data.from_file("evals-hint.parquet") scores_by_factor(evals, "task_arg_hint", ("No hint", "Hint")) ``` The [`tool_calls()`](view-tool-calls.qmd) view enables you to visualize tool calls by sample: ``` python from inspect_viz.view.beta import tool_calls tools = Data.from_file("cybench_tools.parquet") tool_calls(tools) ``` Available views include: | View | Description | |----|----| | [`scores_by_task()`](view-scores-by-task.qmd) | Bar plot for comparing eval scores (with confidence intervals) across models and tasks. | | [`scores_by_factor()`](view-scores-by-factor.qmd) | Bar bar plot for comparing eval scores by model and a boolean factor (e.g. no hint vs. hint). | | [`scores_by_limit()`](view-scores-by-limit.qmd) | Line plot showing success rate by token limit. | | [`scores_timeline()`](view-scores-timeline.qmd) | Scatter plot with eval scores by model, organization, and release date. Filterable by evaluation and organization. | | [`scores_heatmap()`](view-scores-heatmap.qmd) | Heatmap with values for comparing scores across model and task. | | [`scores_by_model()`](view-scores-by-model.qmd) | Bar plot for comparing model scores on a single eval. | | [`tool_calls()`](view-tool-calls.qmd) | Heat map visualising tool calls over evaluation turns. | ## Plots While pre-built views are useful, you also may want to create your own custom plots. Plots in `inspect_viz` are composed of one or more [marks](reference/inspect_viz.mark.qmd), which can do either higher level plotting (e.g. `dot()`, `bar_x()`, `bar_y()`, `area()`, `heatmap()`, etc.) or lower level drawing on tπhe plot canvas (e.g. `text()`, `image()`, `arrow()`, etc.) ### Dot Plot Here is an example of a simple dot plot using a dataset of [GPQA Diamond](https://huggingface.co/datasets/fingertap/GPQA-Diamond) results: ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import dot gpqa = Data.from_file("gpqa.parquet") plot( dot( gpqa, x="model_release_date", y="score_headline_value", fill="model_organization_name", channels= { "Model": "model_display_name", "Score": "score_headline_value", "Stderr": "score_headline_stderr", } ), title="GPQA Diamond", legend="color", grid=True, x_label="Release Date", y_label="Score", y_domain=[0,1.0], ) ``` Line 5 Read the dataset from a parquet file. You can can also use `Data.from_dataframe()` to read data from any Pandas, Polars, or PyArrow data frame. Line 8 Plot using a `dot()` mark. The `plot()` function takes one or more marks or interactors. Line 12 Map the “model_organization_name” column to the `fill` scale of the plot (causing each orgnization to have its own color). Lines 13,17 Show tooltip with defined channels. Line 20 Add a `legend` to the plot as a key to our color mappings. Line 24 Ensure that the y-axis goes from 0 to 1. ### Bar Plot Here is a simple horizontal bar plot that counts the number of each species: ``` python from inspect_viz.mark import bar_x evals = Data.from_file("agi-lsat-ar.parquet") plot( bar_x( evals, x="score_headline_value", y="model_display_name", sort={"y": "x", "reverse": True}, fill="#3266ae" ), title="AR-LSAT", x_label="Score", y_label=None, margin_left=120. ) ``` Line 10 Sort the bars by score (descending). Lines 15-16 Y-axis is labeled with model names so remove default label and ensure it has enough margin. ## Links Inspect Viz supports creating direct links from visualizations to published Inspect log transcripts. Links can be made at the eval level, or to individual samples, messages, or events. For example, this plot produced with `scores_by_model()` includes a link to the underlying logs in its tooltips: ``` python from inspect_viz.view.beta import scores_by_model scores_by_model(evals) # baseline=0.91 ``` The pre-built [Views](views.qmd) all support linking when a `log_viewer` column is available in the dataset. To learn more about ammending datasets with viewer URLs as well as adding linking support to your own plots see the article on [Links](components-links.qmd). ## Filters Use [inputs](reference/inspect_viz.input.qmd) to enable filtering datasets and dynamically updating plots. For example, if we had multiple benchmarks available for a scores timeline, we could add a `select()` input for choosing between them: ``` python from inspect_viz.input import select from inspect_viz.layout import vconcat benchmarks = Data.from_file("benchmarks.parquet") vconcat( select( benchmarks, label="Benchmark", column="task_name", value="auto" ), plot( dot( benchmarks, x="model_release_date", y="score_headline_value", fill="model_organization_name", ), legend="color", grid=True, x_label="Release Date", y_label="Score", y_domain=[0,1.0], color_domain="fixed" ) ) ``` We’ve introduced a few new things here: 1. The `vconcat()` function from the [layout](reference/inspect_viz.layout.qmd) module lets us stack inputs on top of our plot. 2. The `select()` function from the [input](reference/inspect_viz.input.qmd) module binds a select box to the `task_name` column. 3. The `color_domain="fixed"` argument to `plot()` indicates that we want to preserve model organization colors even when the plot is filtered. ## Marks So far the plots we’ve created include only a single [mark](reference/inspect_viz.mark.qmd), however many of the more interesting plots you’ll create will include multiple marks. For example, here we create a heatmap of evaluation scores by model. There is a `cell()` mark which provides the heatmap background color and a `text()` mark which displays the value. ``` python from inspect_viz.mark import cell, text scores = Data.from_file("scores.parquet") plot( cell( scores, x="task_name", y="model", fill="score_headline_value", ), text( scores, x="task_name", y="model", text="score_headline_value", fill="white", ), padding=0, color_scheme="viridis", height=250, margin_left=150, x_label=None, y_label=None ) ``` Marks can be used to draw dots, lines, bars, cells, arrows, text, and images on a plot. ## Data In the examples above we made `Data` available by reading from a parquet file. We can also read data from any Python Data Frame (e.g. Pandas, Polars, PyArrow, etc.). For example: ``` python import pandas as pd from inspect_viz import Data # read directly from file penguins = Data.from_file("penguins.parquet") # read from Pandas DF (i.e. to preprocess first) df = pd.read_parquet("penguins.parquet") penguins = Data.from_dataframe(df) ``` You might wonder why is there a special `Data` class in Inspect Viz rather than using data frames directly? This is because Inpsect Viz is an interactive system where data can be dynamically filtered and transformed as part of plotting—the `Data` therefore needs to be sent to the web browser rather than remaining only in the Python session. This has a couple of important implications: 1. Data transformations should be done using standard Python Data Frame operations *prior* to reading into `Data` for Inspect Viz. 2. Since `Data` is embedded in the web page, you will want to filter it down to only the columns required for plotting (as you don’t want the additional columns making the web page larger than is necessary). ### Selections One other important thing to understand is that `Data` has a built in *selection* which is used in filtering operations on the client. This means that if you want your inputs and plots to stay synchoronized, you should pass the same `Data` instance to all of them (i.e. import into `Data` once and then share that reference). For example: ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import dot from inspect_viz.input import select from inspect_viz.layout import vconcat # we import penguins once and then pass it to select() and dot() penguins = Data.from_file("penguins.parquet") vconcat( select(penguins, label="Species", column="species"), plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", color_domain="fixed" ) ) ``` ## Tables You can also display data in a tabular layout using the `table()` function: ``` python from inspect_viz.table import column, table benchmarks = Data.from_file("benchmarks.parquet") table( benchmarks, columns=[ column("model_organization_name", label="Organization"), column("model_display_name", label="Model"), column("model_release_date", label="Release Date"), column("score_headline_value", label="Score", width=100), column("score_headline_stderr", label="StdErr", width=100), ] ) ``` You can sort and filter tables by column, use a scrolling or paginated display, and customize several other aspects of table appearance and behavior. ## Learning More Use these resources to learn more about using Inspect Viz: - [Views](views.qmd) describes the various available pre-built views and how to customize them. - [Plots](components-plots.qmd) goes into further depth on plotting options and how to create custom plots. - Articles on [Marks](components-marks.qmd), [Links](components-links.qmd), [Tables](components-tables.qmd), [Inputs](components-inputs.qmd), and [Interactivity](components-interactivity.qmd) explore other components commonly used in visualizations. - [Publishing](publishing.qmd) covers publishing Inspect Viz content as standalone plots, notebooks, websites, and dashboards. - [Reference](reference/index.qmd) provides details on the available marks, interactors, transforms, and inputs. - [Examples](examples/index.qmd) demonstrates more advanced plotting and interactivity features. [^1]: This plot was inspired by and includes data from the [Epoch AI](https://epoch.ai/data/ai-benchmarking-dashboard) Benchmarking Hub # Plots A `plot()` produces a single visualisation and consists of one or more *marks*—graphical primitives such as bars, areas, and lines—which serve as chart layers. Each plot has a dedicated set of encoding *channels* with named *scale* mappings such as `x`, `y`, `color`, `opacity`, etc. Below we’ll describe the core semantics of plots and the various ways you can customize them. ## Basics Here is a simple dot plot that demonstrates some key concepts (click on the numbers at right for additional details): ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import dot penguins = Data.from_file("penguins.parquet") plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True, width=700, height=400 ) ``` Lines 8-9 `dot()` mark for a simple dot plot, using a distinct `stroke` and `symbol` to denote the “species” column. Line 10 Legend in the default location, keyed by `symbol`. Lines 11-13 Additional attributes that affect plot size and appearance. ## Facets Plots support faceting of the `x` and `y` dimensions, producing associated `fx` and `fy` scales. For example, here we compare model performance on several tasks. The `task_name` is the `fx` scale, resulting in a separate grouping of bars for each task: ``` python from inspect_viz import Data from inspect_viz.plot import plot, legend from inspect_viz.mark import bar_y evals = Data.from_file("evals.parquet") plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", fill="model", tip=True, channels={ "Task": "task_name", "Model": "model", "Score": "score_headline_value", "Log Viewer": "log_viewer" } ), legend=legend("color", frame_anchor="bottom"), x_label=None, x_ticks=[], fx_label=None, y_label="score", y_domain=[0, 1.0] ) ``` Line 9 Add an x-facet (“task_name”) using the `fx` option. Line 20 Define legend using `legend()` function (to enable setting `location` and other options). Line 21 Remove default x labeling as it is handled by the legend. Line 22 Tweak y-axis with shorter label and ensure that it goes all the way up to 1.0. ## Marks The plots above use only a single mark (`dot()` and `bar_y()` respectively). More sophisticated plots are often constructed with multiple marks. For example, here is a plot that adds a regression line mark do a standard dot plot: ``` python from inspect_viz import Data from inspect_viz.mark import dot, regression_y from inspect_viz.plot import plot athletes = Data.from_file("athletes.parquet") plot( dot( athletes, x="weight", y="height", fill="sex", opacity=0.1 ), regression_y( athletes, x="weight", y="height", stroke="sex" ), legend="color" ) ``` Lines 8,12 Use `fill` to distinguish male and female athletes; use `opacity` to deal with a large density of data points. Lines 13,17 Use `stroke` to ensure that male and female athletes each get their own regression line. ## Tooltips Tooltips enable you to provide additional details when the user hovers their mouse over various regions of the plot. Tooltips are enabled automatically for dot marks (`dot()`, `dot_x()`, `dot_y()`, `circle()`, and `hexagon()`) and cell marks (`cell()`, `cell_x()`, etc.) and can be enabled with `tip=True` for other marks. For example: ``` python plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", fill="model", tip=True ), legend=legend("color", frame_anchor="bottom"), x_label=None, x_ticks=[], fx_label=None, y_label="score", y_domain=[0, 1.0] ) ``` Line 6 Add `tip=True` to enable tooltips for marks where they are not automatically enabled. ![](tooltip-basic.png) Note that tooltips can interfere with plot interactions—for example, if your bar plot was clickable to drive selections in other plots you would not want to specify `tip=True`. ### Channels As illustrated above, tooltips show all dataset channels that provide scales (e.g. `x`, `y`, `fx`, `stroke`, `fill`, `symbol`, etc.). There are a few things we do to improve on the default display: 1. The labels are scale names rather than domain specific names (e.g. “fx” rather than “model”) 2. The order of labels isn’t ideal. 3. There are some duplicate values (e.g “fill” and “fx”) 4. We might want to include additional columns not used in the rest of the plot (e.g. a link to the log file). You can exercise more control over the tooltip by specifying `channels` along with the mark. For example: ``` python plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", fill="model", tip=True, channels={ "Task": "task_name", "Model": "model", "Score": "score_headline_value", "Log Viewer": "log_viewer" } ), ... ) ``` Lines 7,12 The `channels` option maps labels to columns in the underlying data—all defined `channels` will appear in the tooltip. URL values are automatically turned into links as shown here. ![](tooltip-channels.png) ## Titles Plot titles can be added using the `title` option. For example, here we add a title at the top of the frame: ``` python plot( dot(athletes, x="weight", y="height", fill="sex", opacity=0.1), regression_y(athletes, x="weight", y="height", stroke="sex"), title="Olympic Athletes", legend="color" ) ``` If you have facet labels on the top of the x-axis, you may need to provide some additional `top_margin` for the `title` so that it is placed above the facet labels. Use the `title()` function to customize this: ``` python from inspect_viz.mark import title plot( ... title=title("Olympic Athletes", margin_top=40), ... ) ``` You can also customize the font size, weight, and family using the `title()` function. ## Axes There are several options available for controlling the domain, range, ticks, and labels for axes. ### Labels By default axes labels are taken from the columns they are mapped to. Specify an `x_label` or `y_label` to override this: ``` python plot( ..., x_label="release_date", y_label="score" ) ``` If you want no axes label at all, pass `None`. For example: ``` python plot( ..., x_label=None ) ``` ### Domain By default, the x and y axes have a domain that matches the underlying data. For example, if the data ranges from 0 to 0.8 the axes will reflect this. Set a specific `x_domain` or `y_domain` to override this. For example, here we specify that we want the y-axis to span from 0 to 1.0: ``` python plot( ..., y_domain=[0,1.0] ) ``` You can also specify “fixed” for a domain, which will preserve the domain of the initial values plotted. This is useful if you have created filters for your data and you want the axes to remain stable across filtering. For example: ``` python plot( ..., y_domain="fixed" ) ``` ### Ticks You can explicitly control the axes ticks using the `x_ticks` and `y_ticks` options. For example, here we specify ticks from 0 to 100 by 10: ``` python plot( ..., x_ticks=range(0, 100, 10) ) ``` If you want no ticks at all specify `[]`, for example: ``` python plot( ..., x_ticks=[] ) ``` There are several other tick related options. Here - `[x,y]_tick_size` — The length of axis tick marks in pixels. - `[x,y]_tick_rotate` — The rotation angle of axis tick labels in degrees clockwise. - `[x,y]_tick_spacing` — The desired approximate spacing between adjacent axis ticks, affecting the default ticks. - `[x,y]_tick_padding` — The distance between an axis tick mark and its associated text label. - `[x,y]_tick_format` — How to format inputs for axis tick labels (a [d3-format](https://d3js.org/d3-format) or [d3-time-format](https://d3js.org/d3-time-format)). ## Legends *Legends* can be added to `plot` specifications or included as standalone elements: ``` python from inspect_viz.plot import plot, legend plot( ..., legend=legend("color") ) ``` Below we’ll describe the options used to position and style legends. See the `legend()` function documentation for details on all legend options. ### Positioning - Use `frame_anchor` to position the legend on a side or corner of the plot. - Use `inset` to position the legend inside the plot area (use `inset_x` and `inset_y` to position more precisely) For example, to place the legend inset in the top left, you could write: ``` python legend("color", frame_anchor="top-left", inset=20) ``` ![](legend-basic.png) ### Legend Style Legends are by default placed in a bordered box. Use the `border` and `background` options to control box colors (specifying `False` to omit border or background color). For example: ``` python legend("color", border="blue", background="white") ``` ### Multiple Legends You may can pass multiple legends (strings like “color” or calls to `legend()`) to the `plot()` funciton. Each may be positioned independently using `frame_anchor` and `inset`, or if they share a position, the legends will be merged into a container in that location. For example, the following adds two legends in the same container in the default position ( right of the plot): ``` python plot( dot(penguins, x=x_axis, y=y_axis, stroke="species", symbol="species"), grid=True, x_label="Body mass (g) →", y_label="↑ Flipper length (mm)", legend=[legend("color"), legend("symbol")] ) ``` ![](legend-multiple.png) ### Interactions Legends also act as interactors, taking a bound `Selection` as a `target` parameter. For example, discrete legends use the logic of the `toggle` interactor to enable point selections. Two-way binding is supported for Selections using *single* resolution, enabling legends and other interactors to share state. See the docs on [Toggle](components-interactivity.qmd#toggle) interactors for an example of an interactive legend. ### Legend Name The `name` directive gives a `plot` a unique name. A standalone legend can reference a named plot `legend(..., for_plot="penguins")` to avoid respecifying scale domains and ranges. ## Baselines Baselines can be including `baseline()` marks in the plot definition (or by including them in the `marks` option of pre-built [views](views.qmd)). For example, here we add a baseline with the median weight from the athletes data: ``` python from inspect_viz.mark import baseline from inspect_viz.transform import median, sql plot( dot(athletes, x="weight", y="height", fill="sex", opacity=0.1), baseline(70), regression_y(athletes, x="weight", y="height", stroke="sex"), legend="color" ) ``` If you have a simple static baseline, you may simply provide the value, along with other options to customize the label, position, and other attributes of the baseline. You can also use a tranformation function like `median()` to define baselines: ``` python from inspect_viz.mark import title plot( ... baseline( median("weight"), data=athletes, label="Median", label_position="middle", color="red"), ... ) ``` By default, baselines are drawn using the x-axis values. To draw a baseline using the y-axis values, pass `orientation="y"` to the baseline function. ## Margins Since the text included in axes lables is dynamic, you will often need to adjust the plot margins to ensure that the text fits properly within the plot. Use the `margin_top`, `margin_left`, `margin_right`, and `margin_bottom` options to do this. Note that there are also `facet_margin_top`, `facet_margin_left`, etc. options available. For example, here we set a `margin_left` of 100 pixels to ensure that potentially long model names have room to display: ``` python plot( data, bar_y(...), margin_left=100 ) ``` ## Colors Use the `color_scheme` option to the `plot()` function to pick a theme (see the `ColorScheme` reference for available schemes). Use the `color_range` option to specify an explicit set of colors. For example, here we use the “tableau10” `color_scheme`: ``` python plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", fill="model", ), legend=legend("color", frame_anchor="bottom"), x_label=None, x_ticks=[], fx_label=None, y_label="score", y_domain=[0, 1.0], color_scheme="tableau10" ) ``` ## Data In the examples above we made `Data` available by reading from a parquet file. We can also read data from any Python Data Frame (e.g. Pandas, Polars, PyArrow, etc.). For example: ``` python import pandas as pd from inspect_viz import Data # read directly from file penguins = Data.from_file("penguins.parquet") # read from Pandas DF (i.e. to preprocess first) df = pd.read_parquet("penguins.parquet") penguins = Data.from_dataframe(df) ``` You might wonder why is there a special `Data` class in Inspect Viz rather than using data frames directly? This is because Inpsect Viz is an interactive system where data can be dynamically filtered and transformed as part of plotting—the `Data` therefore needs to be sent to the web browser rather than remaining only in the Python session. This has a couple of important implications: 1. Data transformations should be done using standard Python Data Frame operations *prior* to reading into `Data` for Inspect Viz. 2. Since `Data` is embedded in the web page, you will want to filter it down to only the columns required for plotting (as you don’t want the additional columns making the web page larger than is necessary). ### Selections One other important thing to understand is that `Data` has a built in *selection* which is used in filtering operations on the client. This means that if you want your inputs and plots to stay synchoronized, you should pass the same `Data` instance to all of them (i.e. import into `Data` once and then share that reference). For example: ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import dot from inspect_viz.input import select from inspect_viz.layout import vconcat # we import penguins once and then pass it to select() and dot() penguins = Data.from_file("penguins.parquet") vconcat( select(penguins, label="Species", column="species"), plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", color_domain="fixed" ) ) ``` ## SQL You can use the `sql()` transform function to dynamically compute the values of channels within plots. For example, here we dynamically add a `bias` parameter to a column: ``` python from inspect_viz import Data, Param from inspect_viz.input import slider from inspect_viz.layout import vconcat from inspect_viz.mark import area_y from inspect_viz.plot import plot from inspect_viz.transform import sql random_walk = Data.from_file("random-walk.parquet") bias = Param(100) vconcat( slider(label="Bias", target=bias, min=0, max=1000, step=1), plot( area_y( random_walk, x="t", y=sql(f"v + {bias}"), fill="steelblue" ) ) ) ``` Any valid SQL expression can be used. For example, here we use an `IF` expression to set the stroke color based on a column value: ``` python stroke=sql(f"IF(task_arg_hint, 'blue', 'red')") ``` ## Dates ### Numeric Values In some cases your plots will want to deal with date columns as numeric values (e.g. for plotting a regression line). For this case, use the `epochs_ms()` transform function to take a date and turn it into a timestampm (milliseconds since the epoch). For example: ``` python from inspect_viz.mark import regression_y from inspect_viz.transform import epoch_ms regression_y( evals, x=epoch_ms("model_release_date"), y="score_headline_value", stroke="#AAAAAA" ) ``` Note that when doing this you’ll also want to apply formatting to the tick labels so they appear as dates (the next section covers how to do this). ### Tick Formatting Use the tick format attributes (e.g. `x_tick_format` and `y_tick_format`) to specify the formatting for date columns on tick labels. For example: ``` python plot( ..., x_tick_format="%b. %Y" ) ``` You can specify any [d3-time-format](https://d3js.org/d3-time-format) as the tick format. ### Reductions In some cases you may have timeseries data which you’d like to reduce across months or years (e.g.collapse year values to enable comparison over months only). The following transformations can be used to do this: | | | |----|----| | `date_day()` | Transform a Date value to a day of the month for cyclic comparison. Year and month values are collapsed to enable comparison over days only. | | `date_month()` | Transform a Date value to a month boundary for cyclic comparison. Year values are collapsed to enable comparison over months only. | | `date_day_month()` | Map date/times to a month and day value, all within the same year for comparison. | ## Attributes *Attributes* are plot-level settings such as `width`, `height`, margins, and scale options (e.g., `x_domain`, `color_range`, `y_tick_format`). Attributes may be `Param`-valued, in which case a plot updates upon param changes. Some of the more useful plot attribues include: - `width`, `height`, and `aspect_ratio` for controlling plot size. - `margin` and `facet_margin` (and more specific margins like `margin_top`) for controlling layout margins. - `style` for providing CSS styles. - `aria_label` and `aria_description`, `x_aria_label`, `x_aria_description`, etc. for accessibilty attributes. - `x_domain`, `x_range,`y_domain`, and`y_range\` for controlling the domain and range of axes. - Tick settings for `x`, `y`, `fx`, and `fy` axes (e.g. `x_ticks`, `x_tick_rotate`, etc.) - `r` (radius) scale settings (e.g. `r_domain`, `r_range`, `r_label`, etc.) See `PlotAttributes` for documentation on all available plot attributes. # Marks ## Overview *Marks* are graphical primitives, often with accompanying data transforms, that serve as chart layers. Marks accept a `Data` source (which are queried as required) and a set of supported options, including encoding *channels* (such as `x`, `y`, `fill`, and `stroke`) that can encode data *fields*. A data field may be a column reference or query expression, including dynamic param values. Common expressions include aggregates (`count()`, `sum()`, `avg()`, `median()`, *etc.*), window functions, date functions, and a `bin()` transform. Marks support dual modes of operation: if an explicit array of data values is provided instead of a backing `Data` reference, the values will be visualized without issuing any queries to the data. This functionality is particularly useful for adding manual annotations, such as custom rules or text labels. ## Basic Basic marks, such as `dot()`, `bar_x()`, `bar_y()`, `rect()`, `cell()`, `text()`, `tick()`, `rule_x()`, and `rule_y()`, mirror their namesakes in [Observable Plot](https://observablehq.com/plot/). For example, here is a plot with two marks. (a dot plot and a regression line): ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import dot, regression_y athletes = Data.from_file("athletes.parquet") plot( dot(athletes, x="weight", y="height", fill="sex", opacity=0.1), regression_y(athletes, x="weight", y="height", stroke="sex") ) ``` Variants such as `bar_x()` and `bar_y()` indicate spatial orientation and data type assumptions. `bar_y()` indicates vertical bars—continuous `y` over an ordinal `x` domain—whereas `rect_y()` indicates a continuous `x` domain. `Data` is backed by a DuckDB SQL database running in the web browser. Basic marks follow a straightforward query construction process: - Iterate over all encoding channels to build a `SELECT` query. - If no aggregates are encountered, query all fields directly. - If aggregates are present, include non-aggregate fields as `GROUP BY` criteria. - If provided, map filtering criteria to a SQL `WHERE` clause. ## Channels Marks are constructed by mapping *channels* to scales. Besides columns, other types of channel inputs include transforms (e.g. `count()`, `bin()`, `stddev()`, or even arbitrary `sql()` statements) as well as literal values (often used for `text()` annotations on plots or a `line()` drawn at an arbitrary location). Here are the scales which you will most commonly bind channels to: | Scale | Description | |----------|---------------------------------------------------------| | `x` | Horizontal position | | `y` | Vertical position | | `fx` | Horizontal facet position | | `fy` | Vertical facet position | | `z` | Optional ordinal channel for grouping data into series. | | `r` | Radius of a mark (e.g. circle radius) | | `stroke` | Color for mark | | `fill` | Fill color for mark | | `symbol` | Symbol used for mark | In addition, many marks have scales to deal with ranges of x or y values (e.g. area marks, arrows, etc.): | Scale | Description | |-------|------------------------------| | `x1` | Starting horizontal position | | `x2` | Ending horizontal position | | `y1` | Starting vertical position | | `y2` | Ending vertical position | ## Connected The `area()` and `line()` marks connect consecutive sample points. Connected marks are treated similarly to basic marks, with one notable addition: the queries for spatially oriented marks (`area_y()`, `line_x()`) can apply [M4 optimization](https://observablehq.com/@uwdata/m4-scalable-time-series-visualization). The query construction method uses plot width and data min/max information to determine the pixel resolution of the mark range. When the data points outnumber available pixels, M4 performs perceptually faithful pixel-aware binning of the series, limiting the number of drawn points. This optimisation offers dramatic data reductions for both single and multiple series. Separately, a `regression_y()` mark is available for linear regression fits. Regression calculations and associated statistics are performed in-database in a single aggregate query. The mark then draws the regression line and optional confidence interval area. ## Density The `density_y()` mark performs 1D kernel density estimation (KDE). The `density_y()` mark defaults to areas, but supports a `type` option to instead use lines, points, or other basic marks. The generated query performs *linear binning*, an alternative to standard binning that proportionally distributes the weight of a point between adjacent bins to provide greater accuracy for density estimation. The query uses subqueries for the “left” and “right” bins, then aggregates the results. The query result is a 1D grid of binned values which are then smoothed. As smoothing is performed in the browser, interactive bandwidth updates are processed immediately. The `density()`, `contour()`, `heatmap()`, and `raster()` marks compute densities over a 2D domain using either linear (default) or standard binning. Smoothing again is performed in browser; setting the `bandwidth` option to zero disables smoothing. The `contour()` mark then performs contour generation, whereas the `raster()` mark generates a coloured bitmap. The `heatmap()` mark is a convenient shortcut for a `raster()` that performs smoothing by default. Dynamic changes of bandwidth, contour thresholds, and color scales are handled immediately in browser. The `hexbin()` mark pushes hexagonal binning and aggregation to the database. Color and size channels may be mapped to `count()` or other aggregates. Hexagon plotting symbols can be replaced by other basic marks (such as `text()`) via the `type` option. The `dense_line()` mark creates a density map of line segments, rather than points. Line density estimation is pushed to the database. To ensure that steep lines are not over-represented, we approximate arc-length normalisation for each segment by normalising by the number of filled raster cells on a per-column basis. We then aggregate the resulting weights for all series to produce the line densities. # Links ## Overview Inspect Viz supports creating direct links from visualizations to published Inspect log transcripts. Links can be made at the eval level, or to individual samples, messages, or events. The basic steps required for creating links to logs from visualizations are: 1. Publish your log directory using the [`inspect view bundle`](https://inspect.aisi.org.uk/log-viewer.html#sec-publishing) command. 2. Read logs into a data frame using the [log dataframe](https://inspect.aisi.org.uk/dataframe.html) functions, then ammend the data frame with log viewer URLs that point to the published bundle (we’ll cover how to do this below). 3. Include the log viewer URLs as a custom channels on your plot [marks](components-marks.qmd) as appropriate. The link will be available within the [tooltip](components-plots.qmd#tooltips) for your mark. ## Step 1: Publish Logs You can use the command [`inspect view bundle`](https://inspect.aisi.org.uk/log-viewer.html#sec-publishing) (or the [`bundle_log_dir()`](https://inspect.aisi.org.uk/reference/inspect_ai.log.html#bundle_log_dir) function from Python) to create a self contained directory with the log viewer and a set of logs for display. This directory can then be deployed to any static web server ([GitHub Pages](https://docs.github.com/en/pages), [S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteHosting.html), [Netlify](https://docs.netlify.com/get-started/), etc.) to provide a standalone version of the viewer. For example, to bundle the `logs` directory to a directory named `logs-www`: ``` bash $ inspect view bundle --log-dir logs --output-dir logs-www ``` You can then deploy `logs-www` to any static web host. ## Step 2: Prepare Data Next, you’ll want to ammend the data frame that you’ve read with e.g. [`evals_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#evals_df) or [`samples_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#samples_df) with log viewer URLs that point to the published logs. You can do this using the [`prepare()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#prepare) and [`log_viewer()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#log_viewer) functions from the `inspect_ai.analysis` module. For example, if you have previously published your “logs” directory to https://example.com/logs/: ``` python from inspect_viz import Data from inspect_ai.analysis import evals_df, prepare, log_viewer # read evals and ammend with log viewer URL df = evals_df("logs") df = prepare(df, log_viewer("evals", { "logs": "https://example.com/logs/" })) # read as inspect viz data evals = Data.from_dataframe(df) ``` ## Step 3: Link Channel Once your data is prepared, you need to ensure that links are incorporated onto plots. ### Custom Plot If you are creating a custom plot, you should add a mapping to the “log_viewer” column to your mark’s `channels`. For example: ``` python from inspect_viz import Data from inspect_viz.plot import plot, legend from inspect_viz.mark import bar_y evals = Data.from_file("evals.parquet") plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", channels={ "Log Viewer": "log_viewer" }, fill="model", ), legend=legend("color", frame_anchor="bottom"), x_label=None, x_ticks=[], fx_label=None, y_label="score", y_domain=[0, 1.0] ) ``` Line 11 Add Log Viewer channel mapped to the `log_viewer` column created with the `prepare()` function above. ### Built-In Views The built-in [Views](views.qmd) already support the `log_viewer` column, so links appear automatically when using those functions. For example: ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_by_model evals = Data.from_file("agi-lsat-ar.parquet") scores_by_model(evals) ``` If you mouse over the bars you will see a log viewer link which you can click to navigate to the log. # Tables Use tables to display an interactive grid of data used in your visualization. Tables support commonly used operations like sorting, filtering, pagination and a variety of other customization options. ## Basics In its most simple form, the `table()` function will display the contents of the `Data` provided. For example, the following: ``` python from inspect_viz import Data from inspect_viz.table import table penguins = Data.from_file("penguins.parquet") table(penguins) ``` results in a table displaying all the columns and rows in the penguins dataset: In addition to providing the base `Data` for the table, you may also select which columns are displayed: ``` python from inspect_viz import Data from inspect_viz.table import table penguins = Data.from_file("penguins.parquet") table(penguins, columns=[ "species", "island", "sex", "body_mass"]) ``` Tables have a number of global options for configuring the behavior, but also have many options specific to one or more columns. To specify column level options, using the `column` function in the list of columns rather than simply passing the column name: ``` python from inspect_viz import Data from inspect_viz.table import column, table penguins = Data.from_file("penguins.parquet") table(penguins, columns=[ column("species", align="center"), "island", "sex", "body_mass"]) ``` ### Size By default, tables will have a height which matches the size of their content and a width which files their container (with a default maximum size of 500px). You can explicitly provide a height and width value in pixels for the table if you’d like the table to be a specific size: ``` python from inspect_viz import Data from inspect_viz.table import table penguins = Data.from_file("penguins.parquet") table(penguins, height=200, width=550) ``` You can use `max_width` to constrain the maximum width of the table in pixels. It will still attempt to fill its container, but it’s width will not exceed the `max_width`. ## Columns When providing column data for a table, you can provide a list of columns names from your Data to be displayed. You can also use the `column` function to provide additional options for each column. For example, to customize the string that is displayed in the header for the column, using the `label` option like: ``` python from inspect_viz import Data from inspect_viz.table import column, table penguins = Data.from_file("penguins.parquet") table(penguins, columns=[ "species", "island", "sex", column("body_mass", label="mass")]) ``` ### Width If no explicit column size is provided, the width of each column is an equal share of the available space. You can specify the width of columns either using an explicit pixel size: ``` python from inspect_viz import Data from inspect_viz.table import column, table penguins = Data.from_file("penguins.parquet") table(penguins, width=370, columns=[ column("species", width=80), column("island", width=100), column("sex", width=70), column("body_mass", width=100)]) ``` or using `flex` for some or all of the columns. Flex sizing works by dividing the remaining space in the grid among all flex columns in proportion to their flex value. ``` python from inspect_viz import Data from inspect_viz.table import column, table penguins = Data.from_file("penguins.parquet") table(penguins, width=550, columns=[ column("species", flex=1), column("island", flex=1.2), column("sex", width=70), column("body_mass", flex=1)]) ``` You can also use `max_width` to set a maximum width for a column or `min_width` to set a minimum width for a column. This will be used to provide caps on width when columns are being sized automatically using flex sizing. ### Alignment You can control the alignment of the values within each columns header and body using the `align` and `header_align` options. For example: ``` python from inspect_viz import Data from inspect_viz.table import column, table penguins = Data.from_file("penguins.parquet") table(penguins, columns=[ column("species", align="center", header_align="center"), column("island"), column("sex"), column("body_mass")]) ``` ### Formatting You can control the formatting of each cell’s value using the `format` option. The `format` option accepts a [d3-format](https://d3js.org/d3-format) string for numeric values and a [d3-time-format](https://d3js.org/d3-time-format) string for date values to define how the value will be formatted. ``` python table(penguins, columns=[ column("species"), column("island"), column("sex"), column("body_mass", format=",.2f")]) ``` Default formats for values are as follows: [TABLE] ### Text Wrapping You can control the text wrapping behavior of the values within each columns header and body using the `wrap_text` and `header_wrap_text` options. This is most frequently paired with `auto_height` to create rows with automatic heights which wrap text. For example: ``` python table(penguins, columns=[ column("species", auto_height=True, wrap_text=True), column("island", flex=1.2), column("sex", width=70), column("body_mass", flex=1)]) ``` ## Rows ### Height By default, each row of the the table, including the header row, is 29px tall. You can set an explicit row size for the body of the table using the `row_height` argument. Set the header’s row height using the `header_height` argument: ``` python from inspect_viz import Data from inspect_viz.table import table penguins = Data.from_file("penguins.parquet") table(penguins, header_height=60, row_height=50) ``` ### Auto Height In addition to explicitly providing the heights for rows, you can also allow the content to determine the height of the row. To do this, configure one or more column with `auto_height`. The height of the row will then be determined using the largest height required to display the content of any columns with the `auto_height` option. ``` python table(penguins, width=550, columns=[ column("species", flex=1), column("island", flex=1.2, auto_height=True), column("sex", width=70), column("body_mass", flex=1)]) ``` You can also use the `header_auto_height` option to specify columns that will automatically size the header row height. ## Sorting Each column in the table is sortable by clicking on the header for the column you’d like to sort. Each click toggles between the sorting ascending, sorting descending, and not sorting. Holding `shift` while clicking will add the clicked column as a secondary sort, preserving any other sorts that have already been specified. You can disable sorting for the entire table using the `sorting` argument: ``` python table(penguins, sorting=False) ``` You can control whether individual columns can be sorted using the `sortable` option for `column`: ``` python table(penguins, columns=[ column("species", sortable=False), column("island"), column("sex"), column("body_mass")]) ``` ## Filtering Each column of the table is filterable by clicking the filter icon in the header of the column. Depending upon the type of data in the column, different filtering options will be presented to the user. To disable filtering for a table, use `filtering`: ``` python table(penguins, filtering=False) ``` You can control whether individual columns can be filtered using the `filterable` option for `column`: ``` python table(penguins, columns=[ column("species", filterable=False), column("island"), column("sex"), column("body_mass")]) ``` #### Filter Location You can control where in the table filters appear by passing other `header` or `row` as the value for filter. `header` places in the filter as buttons in the header row next to the header text. `row` creates\` a separate row with inline filter UI for filtering columns. For example: ``` python table(penguins, filtering='row') ``` ## Resizing Each column of the table may be resized by the user by clicking and dragging the separator between columns in the header row. To make the table columns not resizable, use the `resizing` option: ``` python table(penguins, resizing=False) ``` You can control whether individual columns can be resized using the `resizable` option for `column`: ``` python table(penguins, columns=[ column("species", resizable=False), column("island"), column("sex"), column("body_mass")]) ``` ## Pagination When configured, tables can display pages of items with pagination controls at the bottom of the table rather than display all the items in a scrollable body. To enable pagination, simply provide the pagination argument to the `table`: ``` python from inspect_viz.table import column, table table(penguins, columns=[ column("species"), column("island"), column("sex"), column("body_mass")], pagination=True) ``` By default, the table will automatically set the page size to use the available space in the table without scrolling. You can also explicitly choose page size and page size options: ``` python from inspect_viz.table import column, table, Pagination table(penguins, columns=[ column("species"), column("island"), column("sex"), column("body_mass")], pagination=Pagination(page_size=20, page_size_selector=[20,40,60])) ``` ## Grouping When displaying tabular data, it can be useful to group the data by specific fields. For example, to display a table with the average attributes of male and female penguins based upon their species, you can using grouping function for some columns: ``` python from inspect_viz.transform import avg, count table(penguins, height=120, columns=[ column("species"), column(avg("body_mass")), column(avg("flipper_length"))]) ``` When providing transforms to apply to columns (e.g. `avg`, `sum`), columns without aggregating transforms will be treated as columns to group by. So in the above example, the table is grouped by `species` displaying the rest of the values using their aggregate values. ## Literal Data You can also pass literal values (an `int | float | bool`) as a column by passing one or more values as the `column` itself. For example: ``` python table(penguins, columns=[ column([1,2,3,4,5,6,7,8,9], label="sample_bucket"), column("species"), column("body_mass"), column("flipper_length")]) ``` If a single value is passed, that value will be repeated for every row in the dataset. If a list of values is passed, each row will increment through the list and include the value from the row index. If the list is shorter than the dataset, values will be repeated by repeatedly iterating through the list. ## Selection By default, the table will display the selection provided by the data source. If you’d like, you can provide an alternative selection by using `filter_by`. #### Targeting Selections It can be useful to use selected rows within a table to target a selection to be used elsewhere (for example, in highlighting points within a dot plot). To do this, use the `target` option to select the output selection. This will cause a selection clause of the form column IN (rows) to be added to the selection for each currently selected table row. For example: ``` python table(penguins, target=selection) ``` You can use the `select` option to control how selection works within the table. By default, `select` is set to `single_row` which will allow selection of one row at a time by clicking the row. Other options are listed below: | Option | Action | |----|----| | `hover` | The selection will be updated when the user’s mouse hovers over a row. | | `single_row` | The selection will be updated when a single row is selected. The selected row will be highlighted. | | `multiple_row` | The selection will be updated when one or more rows are selected. The selected rows will be highlighted. | | `single_row_checkbox` | The selection will be updated when a single row is selected using a checkbox. | | `multiple_row_checkbox` | The selection will be updated when one or more rows is selected using a checkbox. | ## Appearance Tables have a minimal default appearance using the [AG Grid](https://www.ag-grid.com/javascript-data-grid/themes/) `Balham` theme. If the table is being displayed in a [Quarto](https://www.quarto.org) page or dashboard, it will automatically inherit the theme of the page on which it is hosted. You can customize most aspects of the table appearance using the `style` argument like: ``` python from inspect_viz.table import column, table, Pagination, TableStyle table(penguins, columns=[ column("species"), column("island"), column("sex"), column("body_mass")], style=TableStyle( background_color="#FCFAFF", foreground_color="purple", accent_color="#E8FFB3")) ``` ### Color Using the three following basic color options will provide new colors for the table (with the overall colors of the table derived from these three themes). Each of these colors accepts a `css` color value (for example a hex color string or a named value like `red`). | Option | Target | |----|----| | `background_color` | The background color use for cells. | | `foreground_color` | The foreground color used for values within cells. | | `accent_color` | Accent color used for things like selection and highlights. | In addition to the previous basic options, you can do further customization of the colors by passing a `css` color value to the following: | Option | Target | |----|----| | `text_color` | The text color for UI elements presented within the table. | | `header_text_color` | The color for text in the header row. | | `cell_text_color` | The color for text in cell within the body of the table. | | `selected_row_background_color` | The background color of selected rows. | ### Fonts You can control the fonts used by the table by passing a `css` `font-family` value in the following options: | Option | Target | |----------------------|------------------------------------------------------| | `font_family` | The default font for all text within the table. | | `header_font_family` | The font used for text within the header row. | | `cell_font_family` | The font used for text within the body of the table. | ### Border You can control the border of the table using the following border options: | Options | Target | |-----------------|----------------------------------------------------| | `border_color` | The color of the border (value `css` color value). | | `border_width` | The width in pixels of the border. | | `border_radius` | The border radius in pixels. | ### Spacing The `spacing` options controls how tightly data and UI elements are packed together in the table. All the padding within in the table is defined relative to this value, so changing this value will affect the spacing of everything in the table. By default, tables have `4` pixels of spacing. To change this value, pass the number of pixels like so: ``` python table(penguins, columns=[ column("species"), column("island"), column("sex"), column("body_mass")], style=TableStyle(spacing=20)) ``` # Inputs Inputs are used to create interactive visualisations by targeting either `Param` values or `Selection` ranges. Available inputs include: - `select()` - `slider()` - `search()` - `checkbox()` - `radio_group()` - `checkbox_group()` All inputs can write updates to a provided `Param` or `Selection`. Param values are updated to match the input value. Selections are provided a predicate clause. This linking can be bidirectional: an input component will also subscribe to a param and track its value updates. ## Select The `select()` input is used to select one more more values from a list. The list can be sourced from a data column (via the `column` parameter) or be static (via the `options` parameter). Here is a select input bound to a column (this input targets the default `Selection` associated with the passed `Data`): ``` python from inspect_viz import Data from inspect_viz.input import select penguins = Data.from_file("penguins.parquet") select(penguins, label="Species", column="species") ``` Note that by default the select has no value so does not filtering (represented by the default “All” selection). If you want a `select()` to have an initial value then specify it using the `value` parameter (or pass ‘auto’ to select the first value): ``` python select(penguins, label="Species", column="species", value="auto") ``` Here is a select input with explicit options (this input targets a `Param`): ``` python from inspect_viz import Param fruit = Param("Apple") select(label="Fruit", options=["Apple", "Orange", "Banana"], target=fruit) ``` Pass `multiple=True` to enable multiple inputs. In this case values are specified via typing/autocomplete rather than a drop down menu. ``` python athletes = Data.from_file("athletes.parquet") select(athletes, label="Sports", column="sport", multiple=True) ``` ## Slider The `slider()` input enables specificiation of either a single numeric value or a range of values. Here we enable the selection of a maximum body mass for a column: ``` python from inspect_viz.input import slider slider(penguins, label="Max Body Mass", column="body_mass") ``` Pass `select="interval"` to specify an interval rather than single value: ``` python slider(penguins, label="Body Mass Range", column="body_mass", select="interval") ``` Sliders can also target a `Param` and have explicit `min`, `max`, and `step` values: ``` python bias = Param(0.5) slider(label="Bias", min=0, max=1.0, step=0.1, target=bias) ``` ## Search The `search()` input enables filtering a dataset based on text matching. For example, this input filters by WNBA player name ``` python from inspect_viz.input import search players = Data.from_file("wnba-shots-2023.parquet") search(players, label="Athlete", column="athlete_name") ``` ## Checkbox The `checkbox()` input enables toggling a binary value. You can either target boolean `Param` values or provide custom values mapped to checked and unchecked. ``` python from inspect_viz.input import checkbox enabled = Param(True) checkbox(label="Enabled", target=enabled) ``` Here we provide custom values that map to checked and unchecked states: ``` python bias = Param(0.1) checkbox(label="Use Bias", values=[0.0, 0.1], target=bias) ``` ## Radio Group The `radio_group()` is an alternative to `select()` which displays all of the available options rather than collapsing them into a menu: ``` python from inspect_viz.input import radio_group penguins = Data.from_file("penguins.parquet") radio_group(penguins, label="Species", column="species") ``` Or targeting a `Param`: ``` python fruit = Param("Apple") radio_group(label="Fruit", options=["Apple", "Orange", "Banana"], target=fruit) ``` ## Checkbox Group The `checkbox_group()` provides an interface to select multiple values (similar to `select(..., multiple=True)`): ``` python from inspect_viz.input import checkbox_group penguins = Data.from_file("penguins.parquet") checkbox_group(penguins, label="Species", column="species") ``` You can also use `checkbox_group()` with a `Param`: ``` python fruit = Param(["Apple", "Orange"]) checkbox_group(label="Fruit", options=["Apple", "Orange", "Banana"], target=fruit) ``` # Interactivty ## Overview Inspect Viz supports interactive filtering and cross-filtering of plot data based based on [Inputs](components-inputs.qmd) and [Interactors](#interactors). Filtering is done based on *Selections*: each `Data` table has a built-in selection and you can also create `Selection` instances for more sophisticated behaviors. ## Filtering The most straightforward usage of selections is adding inputs which filter the data displayed in a plot. This filtering uses the *built in* selection of `Data` instances. For example, here we add a `select()` input to enable filtering by species: ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import dot from inspect_viz.input import select from inspect_viz.layout import vconcat penguins = Data.from_file("penguins.parquet") vconcat( select(penguins, label="Species", column="species"), plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", color_domain="fixed" ) ) ``` Line 9 `vconcat()` function stacks the select input on top of the plot. Line 10 Select input bound to “species” column. Line 12 Use of `penguins` in both `select()` and `plot()` automatically binds to default selection for the penguins `Data` object. Line 15 Fixed color domain ensures that species colors remain the same even when filtered. ### Fixed Domain The example agove introduces an important concept when dealing with selections and filtering: `"fixed"` scale domains (in this case `color_domain="fixed"`). Fixed scale domains instruct a plot to first calculate a scale domain in a data-driven manner, but then keep that domain fixed across subsequent updates. Fixed domains enable stable configurations without requiring a hard-wired domain to be known in advance, preventing disorienting scale domain “jumps” that hamper comparison across filter interactions. Several of the examples below will use `"fixed"` domains to provide this stability across interactions. ## Params As illustrated above, inputs can be used to filter dataset selections. Inputs can also be used to set `Param` values that make various aspects of plots dynamic. For example, here is a density plot of flight delays which uses a `slider()` input to vary the amount of smooth ing by setting the kernel bandwidth: ``` python from inspect_viz import Param from inspect_viz.input import slider from inspect_viz.mark import density_y flights = Data.from_file("flights.parquet") bandwidth = Param(0.1) vconcat( slider( label="Bandwidth (σ)", target=bandwidth, min=0.1, max=100, step=0.1 ), plot( density_y( flights, x="delay", fill="steelblue", bandwidth=bandwidth ), x_domain="fixed", y_axis=None, height=250, ) ) ``` Line 7 Create a `bandwidth` parameter with a default value of 0.1. Line 11 Bind the `slider()` to the `bandwidth` parameter. Line 16 Apply the `bandwidth` to the plot (plot automatically redraws when the bandwidth changes). ## Interactors *Interactors* imbue plots with interactive behavior. Most interactors listen to input events from rendered plot SVG elements to update bound [*selections*](reference/inspect_viz.qmd#selection). Interactors take facets into account to properly handle input events across subplots. ### Interval The `interval_x()` and `interval_y()` interactors create 1D interval brushes. The `interval_xy()` interactor creates a 2D brush. Interval interactors accept a `pixel_size` parameter that sets the brush resolution: values may snap to a grid whose bins are larger than screen pixels and this can be leveraged to optimize query latency. For example, below we stack two plots vertically, a `dot()` plot along with a `bar_x()` plot that counts the `sex` column. We then add an `interval_x()` interactor that enables us to filter the dataset using selections on the dot plot. ``` python from inspect_viz import Data, Selection from inspect_viz.interactor import Brush, interval_x from inspect_viz.plot import plot from inspect_viz.mark import bar_x, dot, regression_y from inspect_viz.transform import count athletes = Data.from_file("athletes.parquet") range = Selection.intersect() vconcat( plot( dot(athletes, x="weight", y="height", fill="sex", opacity=0.1), regression_y(athletes, x="weight", y="height", stroke="sex"), interval_x( target=range, brush=Brush(fill="none", stroke="#888") ), legend="color" ), plot( bar_x( athletes, filter_by=range, x=count(), y="sex", fill="sex" ), y_label=None, height=150, x_domain="fixed" ) ) ``` Line 9 A `Selection` is a means of filtering datasets based on interactions. Here we use an “intersect” selection for application of a simple filter from dot plot to bar plot. Line 16 The `range` selection is set via the `interval_x()` interactor (which enables using the mouse to select an x-range). Line 17 The `Brush` defines the color of the interactor (in this case `#888`, a medium-gray). Line 23 The `range` selection is consumed using the `filter_by` parameter. Line 28 We set the `x_domain` for the bar plot to “fixed” so that the scale doesn’t change as the dataset is filtered. Try using the mouse to brush over regions on the dot plot—the bar plot will update accordingly. ### Toggle The `toggle()` interactor selects individual points (e.g., by click or shift-click) and generates a selection clause over specified fields of those points. Directives such as `toggle_color()`, `toggle_x()`, and `toggle_y()` simplify specification of which channel fields are included in the resulting predicates. The `highlight()` interactor updates the rendered state of a visualization in response to a Selection. Non-selected points are set to translucent, neutral gray, or other specified visual properties. Selected points maintain normal encodings. This example demonstrates using the `toggle_y()` and `highlight()` interactors to render a bar chart that can be clicked to select a subset of points on the dot plot above it. The dot plot legend also targets the same the selection to make itself interactive. ``` python from inspect_viz import Data, Selection from inspect_viz.interactor import highlight, toggle_y from inspect_viz.plot import legend, plot from inspect_viz.mark import bar_x, dot from inspect_viz.layout import vconcat from inspect_viz.transform import count, date_month_day seattle = Data.from_file("seattle-weather.parquet") weather = Selection.single() vconcat( plot( dot( data=seattle, filter_by=weather, x=date_month_day("date"), y="temp_max", fill="weather", fill_opacity=0.7, r="precipitation", ), legend=legend("color", target=weather), x_tick_format="%b", color_domain="fixed", r_domain="fixed", r_range=[2, 10] ), plot( bar_x(seattle, x=count(), y="weather", fill="weather"), toggle_y(target=weather), highlight(by=weather), x_domain="fixed", y_label=None, height=200 ) ) ``` Line 10 Single selection (filter out all other points). Line 16 Dot plot should filter by the selection. Line 21 Show precipitation level using dot radius. Line 23 Clicks on the legend target the same selection Line 31 `toggle_y()` interactor to filter by weather. Line 32 `highlight()` interactor to fade out unselected bars. Try clicking either the legend or the bar plot elements to filter the dot plot. ## Crossfilter In many cases you’ll want to have an input or interactor that both consumes and produces the same selection (i.e. filtered based on interactions with other inputs or interactors, but also able to provide its own filtering). ### Inputs This example demonstrates crossfiltering across [inputs](reference/inspect_viz.input.qmd). We plot shot types taken during the 2023 WNBA season, providing a `select()` input that filters by team, and another `select()` input that filters by player (which in turn is also filtered by the currently selected team). Click on the numbers at right for additional explanation of the code. ``` python from inspect_viz import Data, Selection from inspect_viz.input import select from inspect_viz.layout import vconcat, hconcat from inspect_viz.mark import bar_x from inspect_viz.plot import plot from inspect_viz.transform import count shots = Data.from_file("wnba-shots-2023.parquet") filter = Selection.crossfilter() vconcat( hconcat( select( shots, label="Team", column="team_name", target=filter ), select( shots, label="Athlete", column="athlete_name", filter_by=filter, target=filter ) ), plot( bar_x( shots, filter_by=filter, x=count(), y="category", fill="category" ), y_label=None, color_domain="fixed", y_domain=["Jump", "Layup", "Hook"], height=200, margin_left=60 ) ) ``` Line 10 Create a crossfilter selection, which enables inputs to both consume and produce the same selection (conditioning their available choices on other inputs). Line 16 The team select box targets the `filter` selection (filtering both the choices in the athelte select box and what is displayed in the plot). Line 20 The athlete select box is both *filtered by* and targets the `filter` selection, enabling it to both confine itself to the selected team as well as filter what is displayed in the plot. Lines 29-30 As different teams and players are selected, the y-axis may take on differnet values and ordering. These options ensure that the y-axis remains stable across selections. ### Interactors This example demonstrates crossfiltering across plot [interactors](reference/inspect_viz.interactor.qmd). We plot histograms showing arrival delay and departure time for flights. When you select a range in one plot, the other plot updates to show only the data within that selection—and vice versa. This bidirectional filtering is achieved using `Selection.crossfilter()`, which ensures each plot’s selection affects all other plots except itself. Click on the numbers at right for additional explanation of the code. ``` python from inspect_viz import Data, Selection from inspect_viz.mark import rect_y from inspect_viz.layout import vconcat from inspect_viz.plot import plot from inspect_viz.transform import count, bin from inspect_viz.interactor import interval_x flights = Data.from_file("flights.parquet") brush = Selection.crossfilter() def flights_plot(x, label): return plot( rect_y( flights, filter_by=brush, x=bin(x), y=count(), fill="steelblue" ), interval_x(target=brush), height=200, x_label=label, x_domain="fixed", y_tick_format="s" ) vconcat( flights_plot("delay", "Arrival Delay (min)"), flights_plot("time", "Departure Time (hour)") ) ``` Line 10 Create a crossfilter selection, which ensures each plot’s selection affects all other plots except itself. Line 12 Our two plots are identical save for the `x` value and the `x_label` so factor out into a function. Line 18 The `interval_x()` interactor enables horizontal selection (targeting the crossfiltering `brush`). Line 21 Use a `"fixed"` domain so that the x-axis remains stable even when being filtered. Try selecting a horizontal range on either or both of the bar plots. # Scores by Task ## Overview The `scores_by_task()` function renders a bar plot for comparing eval scores. ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_by_task evals = Data.from_file("evals.parquet") scores_by_task(evals) ``` ## Data Preparation Above we read the data for the plot from a parquet file. This file was in turn created by: 1. Reading logs into a data frame with [`evals_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#evals_df). 2. Using the [`prepare()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#prepare) function to add [`model_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) and [`log_viewer()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) columns to the data frame. ``` python from inspect_ai.analysis import evals_df, log_viewer, model_into, prepare df = evals_df("logs") df = prepare(df, model_info(), log_viewer("eval", {"logs": "https://samples.meridianlabs.ai/"}), ) df.to_parquet("evals.parquet") ``` You can additionally use the [`task_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#task_info) operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -\> “GPQA Diamond”). Note that both the log viewer links and model names are optional (the plot will render without links and use raw model strings if the data isn’t prepared with `log_viewer()` and `model_info()`). ## Function Reference Bar plot for comparing eval scores. Summarize eval scores using a bar plot. By default, scores (`y`) are plotted by “task_display_name” (`fx`) and “model_display_name” (`x`). By default, confidence intervals are also plotted (disable this with `y_ci=False`). [Source](https://github.com/meridianlabs-ai/inspect_viz/blob/4f22634e35c5dd4410d75f3db2210791c92d61f9/src/inspect_viz/view/beta/_scores_by_task.py#L18) ``` python def scores_by_task( data: Data, model_name: str = "model_display_name", task_name: str = "task_display_name", score_value: str = "score_headline_value", score_stderr: str = "score_headline_stderr", score_label: str | None | NotGiven = NOT_GIVEN, ci: bool | float = 0.95, title: str | Title | None = None, marks: Marks | None = None, width: float | Param | None = None, height: float | Param | None = None, **attributes: Unpack[PlotAttributes], ) -> Component ``` `data` [Data](reference/inspect_viz.qmd#data) Evals data table. This is typically created using a data frame read with the inspect `evals_df()` function. `model_name` str Name of field for the model name (defaults to “model_display_name”) `task_name` str Name of field for the task name (defaults to “task_display_name”) `score_value` str Name of field for the score value (defaults to “score_headline_value”). `score_stderr` str Name of field for stderr (defaults to “score_headline_metric”). `score_label` str \| None \| NotGiven Score axis label (pass None for no label). `ci` bool \| float Confidence interval (e.g. 0.80, 0.90, 0.95, etc.). Defaults to 0.95. `title` str \| [Title](reference/inspect_viz.mark.qmd#title) \| None Title for plot (`str` or mark created with the `title()` function). `marks` [Marks](reference/inspect_viz.mark.qmd#marks) \| None Additional marks to include in the plot. `width` float \| [Param](reference/inspect_viz.qmd#param) \| None The outer width of the plot in pixels, including margins. Defaults to 700. `height` float \| [Param](reference/inspect_viz.qmd#param) \| None The outer height of the plot in pixels, including margins. The default is width / 1.618 (the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio)) `**attributes` Unpack\[[PlotAttributes](reference/inspect_viz.plot.qmd#plotattributes)\] Additional `PlotAttributes`. By default, the `margin_bottom` are is set to 10 pixels and `x_ticks` is set to `[]`. ## Implementation The [Scores by Task](examples/inspect/scores-by-task/index.qmd) example demonstrates how this view was implemented using lower level plotting components. # Scores by Model ## Overview The `scores_by_model()` function creates a horizontal bar plot for comparing the scores of different models on a single evaluation, with one or more baselines overlaid as vertical lines. ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_by_model from inspect_viz.mark import baseline evals = Data.from_file("agi-lsat-ar.parquet") scores_by_model(evals, marks=baseline(0.697, label="Human")) ``` ## Data Preparation Above we read the data for the plot from a parquet file. This file was in turn created by: 1. Reading logs into a data frame with [`evals_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#evals_df). 2. Using the [`prepare()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#prepare) function to add [`model_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) and [`log_viewer()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) columns to the data frame. ``` python from inspect_ai.analysis import evals_df, log_viewer, model_into, prepare df = evals_df("logs") df = prepare(df, model_info(), log_viewer("eval", {"logs": "https://samples.meridianlabs.ai/"}), ) df.to_parquet("agi-lsat-ar.parquet") ``` You can additionally use the [`task_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#task_info) operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -\> “GPQA Diamond”). Note that both the log viewer links and model names are optional (the plot will render without links and use raw model strings if the data isn’t prepared with `log_viewer()` and `model_info()`). ## Function Reference Bar plot for comparing the scores of different models on a single evaluation. Summarize eval scores using a bar plot. By default, scores (`y`) are plotted by “model_display_name” (`y`). By default, confidence intervals are also plotted (disable this with `y_ci=False`). [Source](https://github.com/meridianlabs-ai/inspect_viz/blob/4f22634e35c5dd4410d75f3db2210791c92d61f9/src/inspect_viz/view/beta/_scores_by_model.py#L18) ``` python def scores_by_model( data: Data, *, model_name: str = "model_display_name", score_value: str = "score_headline_value", score_stderr: str = "score_headline_stderr", ci: float = 0.95, sort: Literal["asc", "desc"] | None = None, score_label: str | None | NotGiven = None, model_label: str | None | NotGiven = None, color: str | None = None, title: str | Title | None = None, marks: Marks | None = None, width: float | None = None, height: float | None = None, **attributes: Unpack[PlotAttributes], ) -> Component ``` `data` [Data](reference/inspect_viz.qmd#data) Evals data table. This is typically created using a data frame read with the inspect `evals_df()` function. `model_name` str Column containing the model name (defaults to “model_display_name”) `score_value` str Column containing the score value (defaults to “score_headline_value”). `score_stderr` str Column containing the score standard error (defaults to “score_headline_stderr”). `ci` float Confidence interval (e.g. 0.80, 0.90, 0.95, etc.). Defaults to 0.95. `sort` Literal\['asc', 'desc'\] \| None Sort order for the bars (sorts using the ‘x’ value). Can be “asc” or “desc”. Defaults to “asc”. `score_label` str \| None \| NotGiven x-axis label (defaults to None). `model_label` str \| None \| NotGiven x-axis label (defaults to None). `color` str \| None The color for the bars. Defaults to “\#416AD0”. Pass any valid hex color value. `title` str \| [Title](reference/inspect_viz.mark.qmd#title) \| None Title for plot (`str` or mark created with the `title()` function) `marks` [Marks](reference/inspect_viz.mark.qmd#marks) \| None Additional marks to include in the plot. `width` float \| None The outer width of the plot in pixels, including margins. Defaults to 700. `height` float \| None The outer height of the plot in pixels, including margins. The default is width / 1.618 (the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio)) `**attributes` Unpack\[[PlotAttributes](reference/inspect_viz.plot.qmd#plotattributes)\] Additional `PlotAttributes`. By default, the `y_inset_top` and `margin_bottom` are set to 10 pixels and `x_ticks` is set to `[]`. ## Implementation The [Scores by Model](examples/inspect/scores-by-model/index.qmd) example demonstrates how this view was implemented using lower level plotting components. # Scores by Factor ## Overview The `scores_by_factor()` function renders a bar plot for comparing eval scores by model and a boolean factor (e.g. non-reasoning vs. reasoning, no hint vs. hint, etc.). ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_by_factor evals = Data.from_file("evals-hint.parquet") scores_by_factor(evals, "task_arg_hint", ("No hint", "Hint")) ``` ## Data Preparation Above we read the data for the plot from a parquet file. This file was in turn created by: 1. Reading logs into a data frame with [`evals_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#evals_df). 2. Using the [`prepare()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#prepare) function to add [`model_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) and [`log_viewer()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) columns to the data frame. ``` python from inspect_ai.analysis import evals_df, log_viewer, model_into, prepare df = evals_df("logs") df = prepare(df, model_info(), log_viewer("eval", {"logs": "https://samples.meridianlabs.ai/"}), ) df.to_parquet("evals-hint.parquet") ``` You can additionally use the [`task_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#task_info) operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -\> “GPQA Diamond”). You should also ensure that your evals data frame has a boolean field corresponding to the factor you are splitting on (in the example above this is “task_arg_hint”). ## Function Reference Summarize eval scores with a factor of variation (e.g ‘No hint’ vs. ‘Hint’). [Source](https://github.com/meridianlabs-ai/inspect_viz/blob/4f22634e35c5dd4410d75f3db2210791c92d61f9/src/inspect_viz/view/beta/_scores_by_factor.py#L13) ``` python def scores_by_factor( data: Data, factor: str, factor_labels: tuple[str, str], score_value: str = "score_headline_value", score_stderr: str = "score_headline_stderr", score_label: str = "Score", model: str = "model", model_label: str = "Model", ci: bool | float = 0.95, color: str | tuple[str, str] = "#3266ae", title: str | Mark | None = None, marks: Marks | None = None, width: float | Param | None = None, height: float | Param | None = None, **attributes: Unpack[PlotAttributes], ) -> Component ``` `data` [Data](reference/inspect_viz.qmd#data) Evals data table. This is typically created using a data frame read with the inspect `evals_df()` function. `factor` str Field with factor of variation (should be of type boolean). `factor_labels` tuple\[str, str\] Tuple of labels for factor of variation. `False` value should be first, e.g. `("No hint", "Hint")`. `score_value` str Name of field for x (scoring) axis (defaults to “score_headline_value”). `score_stderr` str Name of field for scoring stderr (defaults to “score_headline_stderr”). `score_label` str Label for x-axis (defaults to “Score”). `model` str Name of field for y axis (defaults to “model”). `model_label` str Lable for y axis (defaults to “Model”). `ci` bool \| float Confidence interval (e.g. 0.80, 0.90, 0.95, etc.). Defaults to 0.95.) `color` str \| tuple\[str, str\] Hex color value (or tuple of two values). If one value is provided the second is computed by lightening the main color. `title` str \| [Mark](reference/inspect_viz.mark.qmd#mark) \| None Title for plot (`str` or mark created with the `title()` function). `marks` [Marks](reference/inspect_viz.mark.qmd#marks) \| None Additional marks to include in the plot. `width` float \| [Param](reference/inspect_viz.qmd#param) \| None The outer width of the plot in pixels, including margins. Defaults to 700. `height` float \| [Param](reference/inspect_viz.qmd#param) \| None The outer height of the plot in pixels, including margins. Default to 65 pixels for each item on the “y” axis. `**attributes` Unpack\[[PlotAttributes](reference/inspect_viz.plot.qmd#plotattributes)\] Additional \`PlotAttributes ## Implementation The [Scores by Factor](examples/inspect/scores-by-factor/index.qmd) example demonstrates how this view was implemented using lower level plotting components. # Scores Timeline ## Overview The `scores_timeline()` function plots eval scores by model, organization, and release date[^1]: ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_timeline evals = Data.from_file("benchmarks.parquet") scores_timeline(evals) ``` ## Data Preparation Above we read the data for the plot from a parquet file. This file was in turn created by: 1. Reading logs into a data frame with [`evals_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#evals_df). 2. Using the [`prepare()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#prepare) function to add [`model_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info), [`frontier()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#frontier) and [`log_viewer()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) columns to the data frame. ``` python from inspect_ai.analysis import ( evals_df, frontier, log_viewer, model_into, prepare ) df = evals_df("logs") df = prepare(df, model_info(), frontier(), log_viewer("eval", {"logs": "https://samples.meridianlabs.ai/"}), ) df.to_parquet("benchmarks.parquet") ``` ## Filtering A `select()` input for tasks is automatically provided if more than one task exists in the `data`. A `checkbox_group()` is automatically provided for organizations if more than one organization exists (you can disable this with `organizations_filter=False`). When multiple organizations exist, clicking on the legend for an organization will filter the plot by that organization. ## Function Reference Eval scores by model, organization, and release date. [Source](https://github.com/meridianlabs-ai/inspect_viz/blob/4f22634e35c5dd4410d75f3db2210791c92d61f9/src/inspect_viz/view/beta/_scores_timeline.py#L28) ``` python def scores_timeline( data: Data, task_name: str = "task_display_name", model_name: str = "model_display_name", model_organization: str = "model_organization_name", model_release_date: str = "model_release_date", score_name: str = "score_headline_name", score_value: str = "score_headline_value", score_stderr: str = "score_headline_stderr", organizations: list[str] | None = None, filters: bool | list[Literal["task", "organization"]] = True, ci: float | bool = 0.95, time_label: str = "Release Date", score_label: str = "Score", eval_label: str = "Eval", title: str | Title | None = None, marks: Marks | None = None, width: float | Param | None = None, height: float | Param | None = None, regression: bool = False, legend: Legend | NotGiven | None = NOT_GIVEN, **attributes: Unpack[PlotAttributes], ) -> Component ``` `data` [Data](reference/inspect_viz.qmd#data) Data read using `evals_df()` and amended with model metadata using the `model_info()` prepare operation (see [Data Preparation](https://inspect.aisi.org.uk/dataframe.html#data-preparation) for details). `task_name` str Column for task name (defaults to “task_display_name”). `model_name` str Column for model name (defaults to “model_display_name”). `model_organization` str Column for model organization (defaults to “model_organization_name”). `model_release_date` str Column for model release date (defaults to “model_release_date”). `score_name` str Column for scorer name (defaults to “score_headline_name”). `score_value` str Column for score value (defaults to “score_headline_value”). `score_stderr` str Column for score stderr (defaults to “score_headline_stderr”) `organizations` list\[str\] \| None List of organizations to include (in order of desired presentation). `filters` bool \| list\[Literal\['task', 'organization'\]\] Provide UI to filter plot by task and organization(s). `ci` float \| bool Confidence interval (defaults to 0.95, pass `False` for no confidence intervals) `time_label` str Label for time (x-axis). `score_label` str Label for score (y-axis). `eval_label` str Label for eval select input. `title` str \| [Title](reference/inspect_viz.mark.qmd#title) \| None Title for plot (`str` or mark created with the `title()` function). `marks` [Marks](reference/inspect_viz.mark.qmd#marks) \| None Additional marks to include in the plot. `width` float \| [Param](reference/inspect_viz.qmd#param) \| None The outer width of the plot in pixels, including margins. Defaults to 700. `height` float \| [Param](reference/inspect_viz.qmd#param) \| None The outer height of the plot in pixels, including margins. The default is width / 1.618 (the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio)) `regression` bool If `True`, adds a regression line to the plot (uses the confidence interval passed using ci). Defaults to False. `legend` [Legend](reference/inspect_viz.plot.qmd#legend) \| NotGiven \| None Legend to use for the plot (defaults to `None`, which uses the default legend). `**attributes` Unpack\[[PlotAttributes](reference/inspect_viz.plot.qmd#plotattributes)\] Additional `PlotAttributes`. By default, the `x_domain` is set to “fixed”, the `y_domain` is set to `[0,1.0]`, `color_label` is set to “Organizations”, and `color_domain` is set to `organizations`. ## Implementation The [Scores Timeline](examples/inspect/scores-timeline/index.qmd) example demonstrates how this view was implemented using lower level plotting components. [^1]: This plot was inspired by and includes data from the [Epoch AI](https://epoch.ai/data/ai-benchmarking-dashboard) Benchmarking Hub # Scores Heatmap ## Overview The `scores_heatmap()`function renders a heatmap for comparing eval scores. ``` python from inspect_viz import Data from inspect_viz.view.beta import scores_heatmap evals = Data.from_file("evals.parquet") scores_heatmap(evals, height=200, legend=True) ``` ## Data Preparation Above we read the data for the plot from a parquet file. This file was in turn created by: 1. Reading logs into a data frame with [`evals_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#evals_df). 2. Using the [`prepare()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#prepare) function to add [`model_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) and [`log_viewer()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#model_info) columns to the data frame. ``` python from inspect_ai.analysis import evals_df, log_viewer, model_into, prepare df = evals_df("logs") df = prepare(df, model_info(), log_viewer("eval", {"logs": "https://samples.meridianlabs.ai/"}), ) df.to_parquet("evals.parquet") ``` You can additionally use the [`task_info()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#task_info) operation to map lower-level task names to task display names (e.g. “gpqa_diamond” -\> “GPQA Diamond”). Note that both the log viewer links and model names are optional (the plot will render without links and use raw model strings if the data isn’t prepared with `log_viewer()` and `model_info()`). ## Function Reference Creates a heatmap plot of success rate of eval data. [Source](https://github.com/meridianlabs-ai/inspect_viz/blob/4f22634e35c5dd4410d75f3db2210791c92d61f9/src/inspect_viz/view/beta/_scores_heatmap.py#L33) ``` python def scores_heatmap( data: Data, task_name: str = "task_display_name", task_label: str | None | NotGiven = None, model_name: str = "model_display_name", model_label: str | None | NotGiven = None, score_value: str = "score_headline_value", cell: CellOptions | None = None, tip: bool = True, title: str | Title | None = None, marks: Marks | None = None, height: float | None = None, width: float | None = None, legend: Legend | bool | None = None, sort: Literal["ascending", "descending"] | SortOrder | None = "ascending", orientation: Literal["horizontal", "vertical"] = "horizontal", **attributes: Unpack[PlotAttributes], ) -> Component ``` `data` [Data](reference/inspect_viz.qmd#data) Evals data table. `task_name` str Name of column to use for columns. `task_label` str \| None \| NotGiven x-axis label (defaults to None). `model_name` str Name of column to use for rows. `model_label` str \| None \| NotGiven y-axis label (defaults to None). `score_value` str Name of the column to use as values to determine cell color. `cell` [CellOptions](reference/inspect_viz.view.qmd#celloptions) \| None Options for the cell marks. `tip` bool Whether to show a tooltip with the value when hovering over a cell (defaults to True). `title` str \| [Title](reference/inspect_viz.mark.qmd#title) \| None Title for plot (`str` or mark created with the `title()` function) `marks` [Marks](reference/inspect_viz.mark.qmd#marks) \| None Additional marks to include in the plot. `height` float \| None The outer height of the plot in pixels, including margins. The default is width / 1.618 (the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio)). `width` float \| None The outer width of the plot in pixels, including margins. Defaults to 700. `legend` [Legend](reference/inspect_viz.plot.qmd#legend) \| bool \| None Options for the legend. Pass None to disable the legend. `sort` Literal\['ascending', 'descending'\] \| SortOrder \| None Sort order for the x and y axes. If ascending, the highest values will be sorted to the top right. If descending, the highest values will appear in the bottom left. If None, no sorting is applied. If a SortOrder is provided, it will be used to sort the x and y axes. `orientation` Literal\['horizontal', 'vertical'\] The orientation of the heatmap. If “horizontal”, the tasks will be on the x-axis and models on the y-axis. If “vertical”, the tasks will be on the y-axis and models on the x-axis. `**attributes` Unpack\[[PlotAttributes](reference/inspect_viz.plot.qmd#plotattributes)\] Additional \`PlotAttributes ## Implementation The [Scores Heatmap](examples/inspect/scores-heatmap/index.qmd) example demonstrates how this view was implemented using lower level plotting components. # Tool Calls ## Overview The `tool_calls()` function creates a heat map visualising tool calls over evaluation turns. ``` python from inspect_viz import Data from inspect_viz.view.beta import tool_calls tools = Data.from_file("cybench_tools.parquet") tool_calls(tools) ``` ## Data Preparation To create the plot we read a raw messages data frame from an eval log using the [`messages_df()`](https://inspect.aisi.org.uk/reference/inspect_ai.analysis.html#messages_df) function, then filter down to just the fields we require for visualization: ``` python from inspect_ai.analysis import messages_df, log_viewer, model_info, prepare, EvalModel, MessageColumns, SampleSummary # read messages from log log = ".eval" # Be sure to add EvalModel column so links can be prepared df = messages_df(log, columns=EvalModel + SampleSummary + MessageColumns) # trim columns df = df[[ "eval_id", "sample_id", "message_id", "model", "id", "order", "tool_call_function", "limit", "log" ]] # prepare the data frame with model info and log links df = prepare(df, [ model_info(), log_viewer("message", url_mappings={ "logs": "https://samples.meridianlabs.ai/" }) ]) # write to parquet df.to_parquet("cybench_tools.parquet") ``` Note that the trimming of columns is particularly important because Inspect Viz embeds datasets directly in the web pages that host them (so we want to minimize their size for page load performance and bandwidth usage). ## Function Reference Heat map visualising tool calls over evaluation turns. [Source](https://github.com/meridianlabs-ai/inspect_viz/blob/4f22634e35c5dd4410d75f3db2210791c92d61f9/src/inspect_viz/view/beta/_tool_calls.py#L15) ``` python def tool_calls( data: Data, x: str = "order", y: str = "id", tool: str = "tool_call_function", limit: str = "limit", tools: list[str] | None = None, x_label: str | None = "Message", y_label: str | None = "Sample", title: str | Title | None = None, marks: Marks | None = None, width: float | None = None, height: float | None = None, **attributes: Unpack[PlotAttributes], ) -> Component ``` `data` [Data](reference/inspect_viz.qmd#data) Messages data table. This is typically created using a data frame read with the inspect `messages_df()` function. `x` str Name of field for x axis (defaults to “order”) `y` str Name of field for y axis (defaults to “id”). `tool` str Name of field with tool name (defaults to “tool_call_function”) `limit` str Name of field with sample limit (defaults to “limit”). `tools` list\[str\] \| None Tools to include in plot (and order to include them). Defaults to all tools found in `data`. `x_label` str \| None x-axis label (defaults to “Message”). `y_label` str \| None y-axis label (defaults to “Sample”). `title` str \| [Title](reference/inspect_viz.mark.qmd#title) \| None Title for plot (`str` or mark created with the `title()` function) `marks` [Marks](reference/inspect_viz.mark.qmd#marks) \| None Additional marks to include in the plot. `width` float \| None The outer width of the plot in pixels, including margins. Defaults to 700. `height` float \| None The outer height of the plot in pixels, including margins. The default is width / 1.618 (the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio)) `**attributes` Unpack\[[PlotAttributes](reference/inspect_viz.plot.qmd#plotattributes)\] Additional `PlotAttributes`. By default, the `margin_top` is set to 0, `margin_left` to 20, `margin_right` to 100, `color_label` is “Tool”, `y_ticks` is empty, and `x_ticks` and `color_domain` are calculated from `data`. ## Implementation The [Tool Calls](examples/inspect/tool-calls/index.qmd) example demonstrates how this view was implemented using lower level plotting components. # Scores by Task This example illustrates the code behind the [`scores_by_task()`](../../../view-scores-by-task.qmd) pre-built view function. If you want to include this plot in your notebooks or websites you should start with that function rather than the lower-level code below. **Code** ``` python from inspect_viz import Data from inspect_viz.plot import plot, legend from inspect_viz.mark import bar_y, TipOptions, text, title from inspect_viz.transform import sql evals = Data.from_file("evals.parquet") plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", channels= { "Log Viewer": "log_viewer" }, fill="model", tip=True ), title=title("Plot Title", margin_top=40), legend=legend("color", frame_anchor="bottom"), x_label=None, fx_label=None, x_ticks=[], y_label="score", y_domain=[0, 1.0], color_label="Model" ) ``` Line 12 Facet the x-axis (i.e. create multiple groups of bars) by task name. Line 14 Add a channel with links to the Inspect log files (links appear in the tooltip). Line 20 We don’t need an explicit “model” or “task_name” label as they are obvious from context. We also don’t need ticks b/c the fill color and legend provide this. Line 21 Ensure that y-axis shows the full range of scores (by default it caps at the maximum). #### Confidence Interval Here we add a confidence interval for each reported score by adding a `rule_x()` mark. Note that we derive the confidence interval transforms using the `ci_bounds()` function. **Code** ``` python from inspect_viz.mark import rule_x from inspect_viz.transform import sql, ci_bounds # confidence interval bounds ci_lower, ci_upper = ci_bounds( score="score_headline_value", level=0.95, stderr="score_headline_stderr" ) plot( bar_y( evals, x="model", fx="task_name", y="score_headline_value", channels= { "Log Viewer": "log_viewer" }, fill="model", tip=True ), rule_x( evals, x="model", fx="task_name", y1=ci_lower, y2=ci_upper, stroke="black", marker="tick-x", ), legend=legend("color", frame_anchor="bottom"), x_label=None, fx_label=None, x_ticks=[], y_label="score", y_domain=[0, 1.0], color_label="Model" ) ``` Lines 4,9 Use the `ci_bounds()` bounds function to create transforms that we will pass for `y1` and `y2`. Lines 19,27 Draw the confidence interval using a `rule_x()` mark. # Scores by Model This example illustrates the code behind the [`scores_by_model()`](../../../view-scores-by-model.qmd) pre-built view function. If you want to include this plot in your notebooks or websites you should start with that function rather than the lower-level code below. The plot summarizes the scores of a single evaluation task, showing performance for 13 different models. Models are ordered based upon their headline score (defaulting to descending). **Code** ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import rule_y, baseline from inspect_viz.transform import ci_bounds evals = Data.from_file("agi-lsat-ar.parquet") ci_lower, ci_upper = ci_bounds( score="score_headline_value", level=0.95, stderr="score_headline_stderr" ) plot( rule_y( evals, x="score_headline_value", y="model", sort={"y": "x", "reverse": True}, stroke_width=4, stroke_linecap="round", marker_end="circle", tip=True, stroke="#416AD0", ), rule_y( evals, x1=ci_lower, x2=ci_upper, y="model", sort={"y": "x", "reverse": True}, stroke="#416AD020", stroke_width=15, ), baseline(0.78, label="Human"), margin_left=225, y_label=None, x_label="Score", x_domain=[0, 1.0] ) ``` Lines 8-12 Create transforms for upper and lower CI bounds. Lines 15-25 This draws the core bar chart, sorting the y-axis by the value of x (descending). Lines 26-34 This draws the error bars using the upper and lower bounds. Line 35 Add a mark for human baseline. Line 36 Ensure there is room for model names in the left margin. Line 39 Ensure that the x axis always goes to 1.0 (even if scores are below that). # Scores by Factor This example illustrates the code behind the [`scores_by_factor()`](../../../view-scores-by-factor.qmd) pre-built view function. If you want to include this plot in your notebooks or websites you should start with that function rather than the lower-level code below. **Code** ``` python from inspect_viz import Data from inspect_viz.mark import frame, rule_y from inspect_viz.plot import legend, plot from inspect_viz.transform import ci_bounds, sql evals = Data.from_file("evals.csv") # factor colors/labels fx_colors = ["#3266ae", "#a6c0e5"] fx_labels = ["No hint", "Hint"] # confidence interval tranforms ci_lower, ci_upper = ci_bounds( score="score_headline_value", level=0.95, stderr="score_headline_stderr" ) # compute plot height (65 pixels per model) height = 65 * len(evals.column_unique("model_display_name")) plot( frame("left", inset_top=5, inset_bottom=5), rule_y( evals, x="score_headline_value", y="task_arg_hint", fy="model_display_name", sort={"fy": "-x"}, stroke=sql(f"IF(NOT task_arg_hint, '{fx_labels[0]}', '{fx_labels[1]}')"), stroke_width=3, stroke_linecap="round", marker_end="circle", tip=True, channels={ "Model": "model_display_name", "Hint": "task_arg_hint", "Score": "score_headline_value", "Stderr": "score_headline_stderr" }, ), rule_y( evals, x1=ci_lower, x2=ci_upper, y="task_arg_hint", fy="model_display_name", stroke=f"{fx_colors[0]}20", stroke_width=15, ), legend=legend("color", target=evals.selection), x_label="Score", y_label=None, y_ticks=[], y_tick_size=0, fy_label=None, fy_axis="left", color_domain=fx_labels, color_range=fx_colors, margin_top=0, margin_left=100, height=height ) ``` Lines 9-10 Factors need to define a dark/light color and labels for the their `False` and `True` states. Line 20 Compute plot height based on number of unique models. Line 23 Sets off each model with their own horizonal axis line. Line 29 Order models on y axis from highest to lowest score. Lines 44-45 Confidence interval using specified stderr column. Line 51 Clickable legend to filter view by factor value. Lines 53-55 Y-axis labels and ticks already covered by factor and `frame()`. Lines 58-59 Map legend and colors map to factor. Line 61 Leave room for model names. # Scores Timeline This example illustrates the code behind the [`scores_timeline()`](../../../view-scores-timeline.qmd) pre-built view function. If you want to include this plot in your notebooks or websites you should start with that function rather than the lower-level code below. The example also relies on some [data preparation](../../../view-scores-timeline.qmd#data-preparation) steps to annotate the raw evals data with shorter model names and a “frontier” column which drives the inclusion of text labels for scores that set a new high water mark. **Code** ``` python from inspect_viz import Data, Selection from inspect_viz.input import checkbox_group, select from inspect_viz.layout import vconcat, vspace from inspect_viz.plot import plot, legend from inspect_viz.mark import dot, rule_x, text, regression_y from inspect_viz.table import table from inspect_viz.transform import ci_bounds, epoch_ms # read data evals = Data.from_file("benchmarks.parquet") # transforms to compute ci bounds from score and stderr columns ci_lower, ci_upper = ci_bounds( score="score_headline_value", level=0.95, stderr="score_headline_stderr" ) vconcat( # select benchmark select(evals, label="Eval: ", column="task_name", value="GPQA Diamond", width=425), # filter models by organization(s) checkbox_group(evals, column="model_organization_name"), # dot plot w/ error bars vspace(15), plot( # benchmark score dot( evals, x=epoch_ms("model_release_date"), y="score_headline_value", r=3, fill="model_organization_name", channels= { "Model": "model_display_name", "Scorer": "score_headline_name", "Stderr": "score_headline_stderr", "Log Viewer": "log_viewer" } ), # confidence interval rule_x( evals, x=epoch_ms("model_release_date"), y="score_headline_value", y1=ci_lower, y2=ci_upper, stroke="model_organization_name", stroke_opacity=0.4, marker="tick-x", ), # regression line regression_y( evals, x=epoch_ms("model_release_date"), y="score_headline_value", stroke="#AAAAAA" ), # frontier annotation text( evals, text="model_display_name", x=epoch_ms("model_release_date"), y="score_headline_value", line_anchor="middle", frame_anchor="right", filter="frontier", dx=-4, fill="model_organization_name", ), legend=legend("color", target=evals.selection), x_domain="fixed", y_domain=[0,1.0], x_label="Release Date", y_label="Score", color_label="Organization", color_domain="fixed", x_tick_format="%b. %Y", grid=True, ) ) ``` Line 10 Benchmark data sourced from [Epoch AI](https://epoch.ai/data/ai-benchmarking-dashboard). Lines 13,17 Create transforms used to compute the confidence intervals for each point. Line 33 Use `epoch_ms` to convert the date into a timestamp so it is numeric for use in computing the regression Lines 37,42 Additional channels are added to the tooltip. Line 52 Confidence interval: compute dynamically using `ci_value()`, color by organization, and reduce opacity. Line 65 Text annotations are automatically moved to avoid collisions. Line 70 Only show annotations for records with `frontier=True`. Line 74 Specifying `target` makes the legend clickable. Lines 75-76 Domains: `x_domain` fixed so that the axes don’t jump around for organization selections; `y_domain` should always span up to 1.0. Line 81 Use a tick format to format the x_axis value (which is a numeric timestamp) into a pretty date string. This plot was inspired by and includes data from the [Epoch AI](https://epoch.ai/data/ai-benchmarking-dashboard) Benchmarking Hub. # Scores Heatmap This example illustrates the code behind the [`scores_heatmap()`](../../../view-scores-heatmap.qmd) pre-built view function. If you want to include this plot in your notebooks or websites you should start with that function rather than the lower-level code below. **Code** ``` python from inspect_viz import Data from inspect_viz.plot import plot from inspect_viz.mark import cell, text evals_data = Data.from_file("evals.parquet") plot( cell( evals_data, x="task_name", y="model", fill="score_headline_value", tip=True, inset=1, sort={ "y": {"value": "fill", "reduce": "sum", "reverse": True}, "x": {"value": "fill", "reduce": "sum", "reverse": False}, }, ), text( evals_data, x="task_name", y="model", text="score_headline_value", fill="white", styles={"font_weight": 600}, ), padding=0, color_scheme="viridis", height=250, margin_left=150, x_label=None, y_label=None ) ``` Line 8 The `cell` mark draws the cells, position each cell along the x and y axis using the fields specified in `x` and `y`. Line 12 The cell’s color is determined using the field specified in the `fill`. Line 14 The cell inset controls the space between cells. Lines 15-18 Sorting of the cells is important in a heatmap to cause colors to be grouped along the x and y axis. In this case, we sort using the sum of the rows and columns, place the highest values at the top righ and lowest values at the bottom left. Line 20 Place the value as text centered in the cell. Line 28 Remove plot padding so the inset along controls spacing between cells. # Tool Calls This example illustrates the code behind the [`tool_calls()`](../../../reference/inspect_viz.view.qmd#tool_calls) pre-built view function. If you want to include this plot in your notebooks or websites you should start with that function rather than the lower-level code below. The plot visualizes tool usage over a series of turns in a Cybench evaluation. We use a `cell()` mark to visualize tool use over messages in each sample of an evaluation. We note any limit that ended the sample using a `text()` mark on the right side of the frame. **Code** ``` python from inspect_viz import Data from inspect_viz.plot import plot, legend from inspect_viz.mark import cell, text from inspect_viz.transform import first # read data (see 'Data Preparation' below) data = Data.from_file("cybench_tools.parquet") tools = ["bash", "python", "submit"] plot( cell( data, x="order", y="id", fill="tool_call_function" ), text( data, text=first("limit"), y="id", frame_anchor="right", font_size=8, font_weight=200, dx=50 ), legend=legend("color", frame_anchor="right"), margin_top=0, margin_left=20, margin_right=100, x_ticks=list(range(0, 400, 80)), y_ticks=[], x_label="Message", y_label="Sample", color_label="Tool", color_domain=tools ) ``` Line 7 Read tool call data (see [Data Preparation](../../../view-tool-calls.qmd#data-preparation) for details). Lines 12,17 `cell()` mark showing tool calls. Lines 19,27 `text()` mark showing whether the sample terminated due to a limit. Lines 29,31 Tweak the margins so the axis labels and text annotations appear correctly. Lines 32-33 Reduce the number of tick marks on the x-axis and eliminate y-ticks. Lines 34-36 Set some custom labels and ensure that tools follow our designed order. Line 37 Specify which tools we should show and in what order. # Penguins Explorer Use the species drop down to see only points for a particular species. Use the x and y drop downs to explore differnet variables. **Code** ``` python from inspect_viz import Data, Param from inspect_viz.input import select from inspect_viz.layout import hconcat, vconcat from inspect_viz.mark import dot from inspect_viz.plot import plot, legend from inspect_viz.table import table penguins = Data.from_file("penguins.parquet") axes = ["body_mass", "flipper_length", "bill_depth", "bill_length"] x_axis = Param("body_mass") y_axis = Param("flipper_length") vconcat( hconcat( select(penguins, label="Species", column="species"), select(label="X", options=axes, target=x_axis), select(label="Y", options=axes, target=y_axis) ), plot( dot(penguins, x=x_axis, y=y_axis, stroke="species", symbol="species"), grid=True, x_label="Body mass (g) →", y_label="↑ Flipper length (mm)", legend="color" ), ) ``` # Bias Parameter Use the slider to create bias offsets for the y-axis. **Code** ``` python from inspect_viz import Data, Param from inspect_viz.input import slider from inspect_viz.mark import area_y from inspect_viz.layout import vconcat from inspect_viz.plot import plot from inspect_viz.transform import sql random_walk = Data.from_file("random-walk.parquet") bias = Param(100) vconcat( slider(label="Bias", target=bias, min=0, max=1000, step=1), plot(area_y(random_walk, x="t", y=sql(f"v + {bias}"), fill="steelblue")) ) ``` # Seattle Weather Select a horizontal range on the dot pot to filter the contents of the bar plot. Click the legend or the bar plot to filter by weather conditions. **Code** ``` python from inspect_viz import Data, Selection from inspect_viz.interactor import Brush, highlight, interval_x, toggle_y from inspect_viz.plot import legend, plot, plot_defaults from inspect_viz.mark import bar_x, dot from inspect_viz.layout import vconcat from inspect_viz.transform import count, date_month_day # data seattle = Data.from_file("seattle-weather.parquet") # plot defaults for domain and range weather = ["sun", "fog", "drizzle", "rain", "snow"] plot_defaults( color_domain=weather, color_range=["#e7ba52", "#a7a7a7", "#aec7e8", "#1f77b4", "#9467bd"] ) # selections (scatter x-range and bar/legend click) range = Selection("intersect") click = Selection("single") vconcat( plot( dot( data=seattle, filter_by=click, x=date_month_day("date"), y="temp_max", fill="weather", fill_opacity=0.7, r="precipitation", ), interval_x(target=range, brush=Brush(fill="none", stroke="#888")), highlight(by=range, fill="#ccc", fill_opacity=0.2), legend=legend("color", target=click), xy_domain="fixed", x_tick_format="%b", r_domain="fixed", r_range=[2, 10] ), plot( bar_x(seattle, x=count(), y="weather", fill="#ccc", fill_opacity=0.2), bar_x(seattle, filter_by=range, x=count(), y="weather", fill="weather"), toggle_y(target=click), highlight(by=click), x_domain="fixed", y_domain=weather, y_label=None, height=200 ) ) ``` # Athletes (Regression) Use the drop downs to filter by sport or sex. Select a range on the plot to filter the table and see the regression lines for the selected range. Hover over the table to highlight the corresponding point on the plot. **Code** ``` python from inspect_viz import Data, Selection from inspect_viz.input import search, select from inspect_viz.interactor import Brush, interval_xy from inspect_viz.layout import hconcat, vconcat from inspect_viz.mark import TextStyles, dot, regression_y, text from inspect_viz.plot import plot from inspect_viz.table import column, table athletes = Data.from_file("athletes.parquet") category = Selection.intersect() query = Selection.intersect(include=category) hover = Selection.intersect(empty=True) vconcat( hconcat( select(athletes, label="Sport", column="sport", target=category), select(athletes, label="Sex", column="sex", target=category), ), plot( text( text=["Olympic Athletes"], frame_anchor="top", styles=TextStyles(font_size=14), dy=-20 ), dot( athletes, filter_by=query, x="weight", y="height", fill="sex", r=2, opacity=0.1, ), regression_y(athletes, filter_by=query, x="weight", y="height", stroke="sex"), interval_xy(target=query, brush=Brush(fill_opacity=0, stroke="black")), dot( athletes, filter_by=hover, x="weight", y="height", fill="sex", stroke="currentColor", stroke_width=1, r=3 ), xy_domain="fixed", r_domain="fixed", color_domain="fixed" ), table( athletes, filter_by=query, target=hover, columns=[ column("name", width=200), "sex", "height", "weight", "sport" ], ) ) ``` # Athletes (Error Bars) Confidence intervals of Olympic athlete heights, in meters. Data are batched into groups of 10 samples per sport. Use the samples slider to see how the intervals update as the sample size increases (as in [online aggregation](https://en.wikipedia.org/wiki/Online_aggregation)). For each sport, the numbers on the right show the maximum number of athletes in the full dataset. **Code** ``` python import pandas as pd import numpy as np from inspect_viz import Data, Param, Selection from inspect_viz.mark import error_bar_x, text from inspect_viz.plot import plot, legend from inspect_viz.transform import count from inspect_viz.input import slider from inspect_viz.layout import hconcat, vconcat, vspace # prepare data (create batch column so we can target various numbers of samples) df = pd.read_parquet("athletes.parquet") df = df[df['height'].notna()] df['row_num'] = df.groupby('sport').cumcount() + 1 df['batch'] = 10 * np.ceil(df['row_num'] / 10).astype(int) df = df.drop('row_num', axis=1) df = df.reset_index(drop=True) athletes = Data.from_dataframe(df) ci = Param(0.95) query = Selection.single() vconcat( hconcat( slider( athletes, label="Samples", select="interval", target=query, column="batch", step=10, value=(0,20) ), slider( label="Conf.", target=ci, min=0.5, max=0.999, value=0.95, step=0.001 ) ), plot( error_bar_x( athletes, filter_by=query, ci=ci, x="height", y="sport", stroke="sex", stroke_width=1, marker="tick", sort={ "y": "-x"} ), text( athletes, text=count(), y="sport", dx=25, frame_anchor="right", font_size=8, fill="#999" ), legend=legend("color", frame_anchor="bottom"), x_domain=[1.5,2.1], y_domain="fixed", y_grid=True, y_label=None, margin_top=0, margin_left=105 ) ) ``` # Plots ## Overview There are several ways to publish plots created with Inspect Viz: 1. Use the `write_png()` function to create an image for use in a document or presentation. 2. Use the `write_html()` function to create a standalone HTML file. 3. Create and share a Jupyter notebook with the plot. 4. Embed the plot in a website or dashboard. This article will cover the first two options for sharing standalone plots—the [Notebooks](publishing-notebooks.qmd), [Websites](publishing-websites.qmd), and [Dashboards](publishing-dashboards.qmd) articles will cover the other possibilities. ## Plot as Image Use the `write_png()` function to save a PNG version of any plot. > [!NOTE] > > The `write_png()` function requires that you install the > [playwright](https://playwright.dev/python/) Python package, which > enables taking screenshots of web graphics using an embedded version > of the Chromium web browser. You can do this as follows: > > ``` bash > pip install playwright > playwright install > ``` To create a plot and export it as a PNG: ``` python from inspect_viz import Data from inspect_viz.mark import dot from inspect_viz.plot import plot, write_png penguins = Data.from_file("penguins.parquet") pl = plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True ) write_png("penguins.png", pl) ``` Here is the plot that was written (note that since this plot is a static PNG file rather than a JavaScript widget it does not have tooltips): ![](penguins.png) ### Plot Size You can control the size of the image written by specifying the `width` and `height` directly in the call to `plot()`. For example: ``` python pl = plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True, width=900, height=400 ) ``` ### Export Options There are a couple of other options that can be used when exporting plots to PNG: | Option | Description | |----|----| | `scale` | Device scale to capture plot at. Use 2 (the default) for retina quality images suitable for high resolution displays or print output) | | `padding` | Padding (in pixels) to add around exported plot. Defaults to 8 pixels. | ## Plot as HTML You can also create an HTML version of a plot using the `write_html()` function. For example: ``` python from inspect_viz import Data from inspect_viz.mark import dot from inspect_viz.plot import plot, write_html penguins = Data.from_file("penguins.parquet") pl = plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True ) write_html("penguins.html", pl) ``` Unlike with `write_png()`, the exported HTML plot retains all interactive features (tooltips, filters, etc.). In some cases you might therefore also include [inputs](components-inputs.qmd) with your plot. For example: ``` python pl = vconcat( select(penguins, label="Species", column="species"), plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", color_domain="fixed" ) ) write_html("penguins.html", pl) ``` # Notebooks ## Overview A convenient way to share sets of plots and related commentary is to publish a notebook. There are a couple of straightforward ways to create HTML documents from notebooks ([Quarto](#quarto) and [nbconvert](#nbconvert)), and then these documents can in turn be printed to PDF if required. You can also share a live version of a notebook that supports filtering and plot interactions by pubishing it on a platform like [Google Colab](https://colab.research.google.com/). ## Quarto The [Quarto](https://quarto.org) publishing system can also convert notebooks to HTML. To install the `quarto-cli` Python package: ``` bash pip install quarto-cli ``` Then, convert any notebook which includes Inspect Viz plots as follows: ``` bash quarto render notebook.ipynb --to html --execute ``` This will create an HTML file named “notebook.html” and a directory named “notebook_files” alongside the “notebook.ipynb”. > [!IMPORTANT] > > The `--execute` flag is required to ensure that all Inspect Viz > outputs are properly rendered (as some notebook front ends like VS > Code don’t properly cache Jupyter Widget outputs). #### Preview To work on a notebook with a live updating preview, use the `quarto preview` command: ``` bash quarto preview notebook.ipynb --to html --execute ``` #### Code Blocks You can also specify that you’d to disable display of code blocks using the `-M echo:false` option: ``` bash quarto render notebook.ipynb --to html --execute -M echo:false ``` If you need a PDF version of the notebook, open the file in a browser and print to PDF. #### Publishing You can use the [Quarto Publish](https://quarto.org/docs/publishing/) command to publish a notebook to GitHub Pages, Hugging Face Spaces, Netlify, or Quarto’s own publishing service. To publish a notebook, pass it to the `quarto publish` command: ``` bash quarto publish notebook.ipynb ``` ## nbconvert The [nbconvert](https://nbconvert.readthedocs.io/en/latest/) Python package enables export of any Jupyter notebook to HTML. Install `nbconvert` with: ``` bash pip install nbconvert ``` Then, convert any notebook which includes Inspect Viz plots as follows: ``` bash jupyter nbconvert --to html --execute notebook.ipynb ``` This will create an HTML file named “notebook.html” alongside the “notebook.ipynb”. > [!IMPORTANT] > > The `--execute` flag is required to ensure that all Inspect Viz > outputs are properly rendered (as some notebook front ends like VS > Code don’t properly cache Jupyter Widget outputs). #### Code Cells You can also specify that you’d like code cells removed using the `--no-input` option: ``` bash jupyter nbconvert --to html --execute --no-input notebook.ipynb ``` If you need a PDF version of the notebook, open the file in a browser and print to PDF. # Websites ## Overview If you want to publish one or more plots as part of a website there are a couple of high level orientations to the problem: 1. Use a Jupyter-based website publishing system that supports interactive Jupyter Widgets (e.g. [Quarto](https://quarto.org)). 2. Use the `to_html()` function to create embeddable HTML fragments for your plots and embed them in any website. We’ll cover both of these approaches below. ## Quarto Websites The [Quarto](https://quarto.org) publishing system can create websites that include dynamic output from Python code, including interactve Jupyter Widgets like the ones created by Inspect Viz. To install the `quarto-cli` Python package: ``` bash pip install quarto-cli ``` Quarto is a markdown-based publishing system that enables you to embed executable Python blocks whose output is included in the published website. For instance, Here is the source code for the [Bias Parameter](examples/general/bias-parameter/index.qmd) example: ```` python --- title: "Bias Parameter" echo: false --- Use the slider to create bias offsets for the y-axis. ```{python} from inspect_viz import Data, Param from inspect_viz.input import slider from inspect_viz.mark import area_y from inspect_viz.plot import plot from inspect_viz.transform import sql random_walk = Data.from_file("random-walk.parquet") bias = Param(0) ``` ```{python} slider(label="Bias", target=bias, min=0, max=1000, step=1, value=100) ``` ```{python} plot(area_y(random_walk, x="t", y=sql(f"v + {bias}"), fill="steelblue")) ``` ```` Line 3 We specify `echo: false` to prevent display of code blocks. Lines 8-17 Markdown code blocks decorated with `{python}` are executed. Here is what the page looks like when rendered on the website (it’s a screenshot so you won’t be able to use the slider!): ![](bias-parameter.png) ### Notebook Execution When using Inspect Viz with Quarto Websites you should always add the following configuration to your `_quarto.yml` to specify that notebooks should be fully executed when rendered: **\_quarto.yml** ``` yaml execute: enabled: true ``` ### Learning More The website was created with Quarto and includes many live Inspect Viz plots and tables. The source code for the [Examples](https://github.com/meridianlabs-ai/inspect_viz/tree/main/docs/examples) section is a good place to start to understand the basics. The documentation on [Quarto Websites](https://quarto.org/docs/websites/) includes a tutorial and many additional details on creating, customizing, and publishing websites. [Quarto Dashboards](publishing-dashboards.qmd) are a special type of Quarto website optimized for displaying many plots and tables together, so are also worth considering. ## HTML Fragments If you are working with an existing website or with another website publishing system, it is also straightforward to embed HTML snippets which include Inspect Viz plots and tables. ### Single Plot To create a standalone snippet which you can include in any website, use the `write_html()` function: ``` python from inspect_viz import Data from inspect_viz.mark import dot from inspect_viz.plot import plot, write_html penguins = Data.from_file("penguins.parquet") pl = plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True ) write_html("penguins.html", pl) ``` ### Multiple Plots If you want to include multiple plots on a page, you might find it more convenient to call the `to_html()` function as part of your website generation process. The returned HTML includes the Jupyter Widget runtime dependencies, so if you have multiple plots you’ll instead want to include these dependencies once in the `` of your document and the create HTML snippets without the dependencies. Here is the dependencies code that you should place in the `` tag: ``` html ``` Then, specify `dependencies=False` when you call `to_html()` to get only the plot and not the dependencies scripts which are already in your `` tag: ``` python from inspect_viz.plot import to_html pl = plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True ) pl_html = to_html(pl, dependencies=False) # ...include pl_html in your website ``` # Dashboards ## Overview [Quarto Dashboards](https://quarto.org/docs/dashboards/) are a special type of Quarto website optimized for publishing easily navigable sets of plots and tables. Features of Quarto Dashboards include: 1. Many flexible ways to layout components (row or column based, tabsets, multiple pages, etc.) including responsive layout for mobile devices. 2. A variety of ways to present inputs for interactivity including toolbars, sidebars, and card-level inputs. 3. Dozens of available themes including the ability to create your own themes. ## Example Here is the [Scores Timeline](examples/inspect/scores-timeline/index.qmd) example from this repository re-written as a dashboard (this is a live dashboard embedded as an iframe): Below is the source code for this dashboard. You’ll notice that this looks quite similar to the code for any other Quarto document, but level-two headings (`##`) have been added to denote a toolbar and dashboard rows (additional headings could be used to create columns and tabsets). ```` python --- title: "Capabilities Timeline" format: dashboard --- ```{python} from inspect_viz import Data, Param from inspect_viz.input import select from inspect_viz.mark import dot from inspect_viz.plot import plot from inspect_viz.table import table, column from inspect_viz.view.beta import scores_timeline from inspect_viz.input import checkbox_group, select evals = Data.from_file("benchmarks.parquet") ``` ## {.sidebar} ```{python} select( evals, column="task_name", value="auto", label="Benchmark" ) checkbox_group( evals, column="model_organization_name", label="Organization" ) ``` *** Benchmark data from the Epoch AI [Benchmarking Hub](https://epoch.ai/data/ai-benchmarking-dashboard). ## Column ### Row {height=60%} ```{python} scores_timeline(evals, filters=False) ``` ### Row {height=40%} ```{python} table( evals, columns=[ column("model_organization_name", label="Organization"), column("model_display_name", label="Model"), column("model_release_date", label="Release Date"), column("score_headline_value", label="Score", width=100), column("score_headline_stderr", label="StdErr", width=100), ] ) ``` ```` ## Notebook Execution When using Inspect Viz with Quarto Websites you should always add the following configuration to your `_quarto.yml` to specify that notebooks should be fully executed when rendered: **\_quarto.yml** ``` yaml execute: enabled: true ``` ## Learning More - See the [Quarto Dashboards](https://quarto.org/docs/dashboards/) documentation for additional details on creating dashboards. - See the [Dashboard Examples](https://quarto.org/docs/gallery/#dashboards) to get an idea for the sorts of layouts and themes that are available and to see the source code for a variety of dashboard types. # PNG Output ## Overview When publishing a [notebook](publishing-notebooks.qmd), [website](publishing-websites.qmd), or [dashboard](publishing-dashboards.qmd), Inspect Viz plots are rendered by default as Jupyter Widgets that use JavaScript to provide various interactive features (tooltips, filtering, brushing, etc.). While this is the recommended way to publish Inspect Viz content, you can also choose to render content as static PNG images. You might want do this if you are creating an Office or PDF document from a notebook, or want plots in a dashboard to be available even when disconnected from the Internet. Note however that rendering plots as PNG images does take longer than the native JavaScript output format, and that interactive features are not available in this mode. #### Prerequisites To create PNG output with Inspect Viz, first install the [playwright](https://playwright.dev/python/) Python package, which enables taking screenshots of web graphics using an embedded version of the Chromium web browser. You can do this as follows: ``` bash pip install playwright playwright install ``` ## Standalone Use the `write_png()` function to save a stanalone PNG version of any plot. For example: ``` python from inspect_viz import Data from inspect_viz.mark import dot from inspect_viz.plot import plot, write_png penguins = Data.from_file("penguins.parquet") pl = plot( dot(penguins, x="body_mass", y="flipper_length", stroke="species", symbol="species"), legend="symbol", grid=True ) write_png("penguins.png", pl) ``` ## Embedded When your plots are embedded in a notebook or website, use the global `output_format` option to specify that you’d like to render them in PNG format. For example, the plot below is rendered as a static PNG graphic: ``` python from inspect_viz import Data, options from inspect_viz.view.beta import scores_by_factor # set 'png' as default output format options.output_format = "png" # render plot evals = Data.from_file("evals-hint.parquet") scores_by_factor(evals, "task_arg_hint", ("No hint", "Hint")) ``` Lines 4-5 Set the global `options.output_format` option to render all plots in a notebook or Quarto document as static PNG images. You can also do this for a single plot or set of plots using `options_context()`: ``` python from inspect_viz import options_context with options_context(output_format="png"): # plot code here ``` Note that when rendering a PDF document with Quarto, the output format is automatically set to “png” (as PDFs can’t ever include interactive JavaScript content).