inspect_viz.sandbox

Evals

evals_summary_plot

Bar plot for comparing evals.

def evals_summary_plot(
    evals: Data,
    x: str = "model",
    fx: str = "task_name",
    y: AxisValue | None = None,
    x_filter: bool | AxisFilter = False,
    fx_filter: bool | AxisFilter = False,
) -> Component
evals Data

Evals data table (typically read using evals_df())

x str

Name of field for x axis (defaults to “model”)

fx str

Name of field for x facet (defaults to “task_name”)

y AxisValue | None

Definition for y axis (defaults to axis_score())

x_filter bool | AxisFilter

Optional filtering control for x axis.

fx_filter bool | AxisFilter

Optional filtering control for fx axis.

evals_summary_table

Table that summarizes eval scores by model and task.

def evals_summary_table(
    evals: Data, columns: Sequence[str | Column] | None = None
) -> Component
evals Data

Evals data table.

columns Sequence[str | Column] | None

Column definitions (defaults to model, task_name, and headline metric).

Axis

AxisFilter

Filter definition for plot axis.

class AxisFilter(BaseModel)

Attributes

label str | None

Filter label (defaults to column namne).

value Literal['all'] | str | list[str]

Initial value (defaults to “all” which applies to filter).

multiple bool

Enable filtering on multiple values.

width int | None

Width of filter input in pixels.

AxisValue

Axis value options.

class AxisValue(BaseModel)

Attributes

label str

Axis label.

value_field str

Field to read value from.

stderr_field str | None

Field to read stderr from (optional, required for plotting confidence intervals).

ci float | None

Confidence interval (e.g. 0.80, 0.90, 0.95, etc.).

domain list[float] | None

Domain of axis (range of values to display).

axis_score

Axis definition for scores from evals_df() data frames.

def axis_score(ci: float = 0.95) -> AxisValue
ci float

Confidence interval (e.g. 0.80, 0.90, 0.95, etc.).