Scores Radar By Metric

Dataset: writing_bench.parquet

This example illustrates the code behind the scores_radar_by_metric() pre‑built view function. If you want to include this plot in your notebooks or sites, start with that function rather than the lower‑level code below.

scores_radar_by_metric() is useful to compare scores across multiple models and metrics from a single task with composite metrics. The data preparation function scales values for visualization purposes by normalizing them using percentile ranks or min-max normalization, the raw values are displayed in the tooltips.

Code

from inspect_viz import Data, Selection
from inspect_viz.mark import circle, line, text
from inspect_viz.plot import legend, plot
from inspect_viz.view import LabelStyles
from inspect_viz.view._scores_radar import (
    axes_coordinates,
    grid_circles_coordinates,
    labels_coordinates,
)


data = Data.from_file("writing_bench.parquet")

channels = {
    "Model": "model_display_name",
    "Metric": "metric",
    "Score": "value",
    "Log viewer": "log_viewer",
}

metrics = data.column_unique("metric")
axes = axes_coordinates(num_axes=len(metrics))
grid_circles = grid_circles_coordinates()
labels = labels_coordinates(labels=metrics)

# enable interactive highlighting of a chosen model
model_selection = Selection.single()

elements = [
    *[
        line(
            x=data["x"],
            y=data["y"],
            stroke="#e0e0e0",
        )
        for data in grid_circles
    ],
    line(
        x=axes["x"],
        y=axes["y"],
        stroke="#ddd",
    ),
    line(
        data,
        x="x",
        y="y",
        stroke="model_display_name",
        filter_by=model_selection,
        tip=True,
        channels=channels,
    ),
    line(
        data,
        x="x",
        y="y",
        stroke="model_display_name",
        stroke_opacity=0.4,
        tip=False,
    ),
    circle(
        data,
        x="x",
        y="y",
        r=4,
        fill="model_display_name",
        stroke="white",
        filter_by=model_selection,
        tip=False,
    ),
    # axis labels
    *[
        text(
            x=label["x"],
            y=label["y"],
            text=label["label"],
            frame_anchor=label["frame_anchor"],
            styles=LabelStyles(line_width=8),
        )
        for label in labels
    ],
]

plot(
    elements,
    margin=60,
    x_axis=False,
    y_axis=False,
    width=400,
    height=400,
    legend=legend("color", target=model_selection),
)

1: Load data from a Parquet file into an inspect_viz.Data table.
2: Channels provide readable names for tooltips and the log viewer.
3: Coordinates: compute coordinates for axes, grid circles, and labels.
4: Selection enables interactive hovering/clicking to emphasize a single model.
5: Grid lines line() mark draws grid circles.
6: Axes spokes line() mark draws axes.
7: Polygon outlines line() mark draws polygon outlines.
8: Polygon vertex markers circle() mark draws polygon vertex markers.
9: Axis labels text() mark draws axis labels.
10: Layout draws the plot with no axes since axes are arbitrary scalers in the radar chart.
11: Legend draws a legend for the model selection.

Data Preparation

The data dataset for this example was created using the scores_radar_by_metric_df() function, which reads evals metadata, scales scores by percentile ranks or min-max normalization, and computes coordinates for the radar chart.

Above we read the data for the plot from a parquet file. This file was in turn created by:

Reading evals level data into a data frame with evals_df().
Converting the evals dataframe into a dataframe specifically used by scores_radar_by_metric() by using the scores_radar_by_metric_df() function. The output of scores_radar_by_metric_df() can be directly passed to scores_radar_by_metric(). scores_radar_by_metric_df() expects a scorer name, an optional list of metric names to visualize, an optional list of metric names to invert where lower scores correspond to better scores, an optional normalization method to scale scores, and an optional min-max domain to use for normalization on the radar chart.
Using the prepare() function to add model_info() and log_viewer() columns to the data frame.

Here is the data preparation code end-to-end:

from inspect_ai.analysis import (
    evals_df,
    log_viewer,
    model_info,
    prepare,
)
from inspect_viz.view import scores_radar_by_metric_df


df = evals_df("logs/writing_bench/")

df = scores_radar_by_metric_df(
    df,
    scorer="multi_scorer_wrapper",
    metrics=[
        "Abstract",
        "Introduction",
        "Experiments",
        "Literature Review",
        "Paper Outline",
    ],
    normalization="percentile",
)

df = prepare(df, [
    model_info(),
    log_viewer("eval", { "logs": "https://samples.meridianlabs.ai/" })
])

df.to_parquet("writing_bench.parquet")

1: Read the evals data into a dataframe.
2: Convert the dataframe into a scores_radar_by_metric() specific dataframe.
3: A task might have multiple scorers, specify the scorer which you want to plot. The function only supports plotting one scorer at a time. The scorer name should correspond to columns in df named score_{scorer}_{metric}.
4: Specify a list of metrics to plot on the radar chart. If unspecified, all metrics from a scorer will be plotted. Metric names in the list should correspond to columns in df named score_{scorer}_{metric}.
5: Choose an optional normalization method to scale the raw scores. Available options: "percentile" (computes percentile rank, useful for identifying consistently strong performers), "min_max" (scales scores between min-max values, sensitive to outliers), or "absolute" (default, no normalization, may result in incomprehensible charts if metrics have different scales).
6: Add pretty model names and log links to the dataframe using prepare().