inspect_viz.transform

SQL

sql

SQL transform for a column.

def sql(sql: str, label: str | None = None) -> Transform
sql str

A SQL expression string to derive a new column value. Embedded Param references, such as f"{param} + 1", are supported. For expressions with aggregate functions, use agg() instead.

label str | None

A label for this expression, for example to label a plot axis.

agg

Aggregation transform for a column.

def agg(agg: str, label: str | None = None) -> Transform
agg str

A SQL expression string to calculate an aggregate value. Embedded Param references, such as f"SUM({param} + 1)", are supported. For expressions without aggregate functions, use sql() instead.”

label str | None

A label for this expression, for example to label a plot axis.

Column

column

Intpret a string or param-value as a column reference.

def column(column: str | Param) -> Transform
column str | Param

Column name or paramameter.

bin

Bin a continuous variable into discrete intervals.

def bin(
    bin: str | float | bool | Param | Sequence[str | float | bool | Param],
    interval: Literal[
        "date",
        "number",
        "millisecond",
        "second",
        "minute",
        "hour",
        "day",
        "month",
        "year",
    ]
    | None = None,
    step: float | None = None,
    steps: float | None = None,
    minstep: float | None = None,
    nice: bool | None = None,
    offset: float | None = None,
) -> Transform
bin str | float | bool | Param | Sequence[str | float | bool | Param]

specifies a data column or expression to bin. Both numerical and temporal (date/time) values are supported.

interval Literal['date', 'number', 'millisecond', 'second', 'minute', 'hour', 'day', 'month', 'year'] | None

The interval bin unit to use, typically used to indicate a date/time unit for binning temporal values, such as hour, day, or month. If date, the extent of data values is used to automatically select an interval for temporal data. The value number enforces normal numerical binning, even over temporal data. If unspecified, defaults to number for numerical data and date for temporal data.

step float | None

The step size to use between bins. When binning numerical values (or interval type number), this setting specifies the numerical step size. For data/time intervals, this indicates the number of steps of that unit, such as hours, days, or years.

steps float | None

The target number of binning steps to use. To accommodate human-friendly (“nice”) bin boundaries, the actual number of bins may diverge from this exact value. This option is ignored when step is specified.

minstep float | None

The minimum allowed bin step size (default 0) when performing numerical binning. For example, a setting of 1 prevents step sizes less than 1. This option is ignored when step is specified.

nice bool | None

A flag (default true) requesting “nice” human-friendly end points and step sizes when performing numerical binning. When step is specified, this option affects the binning end points (e.g., origin) only.

offset float | None

Offset for computed bins (default 0). For example, a value of 1 will result in using the next consecutive bin boundary.

date_day

Transform a Date value to a day of the month for cyclic comparison.

Year and month values are collapsed to enable comparison over days only.

def date_day(expr: str | Param) -> Transform
expr str | Param

Expression or parameter.

date_month

Transform a Date value to a month boundary for cyclic comparison.

Year values are collapsed to enable comparison over months only.

def date_month(expr: str | Param) -> Transform
expr str | Param

Expression or parameter.

date_month_day

Map date/times to a month and day value, all within the same year for comparison.

The resulting value is still date-typed.

def date_month_day(expr: str | Param) -> Transform
expr str | Param

Expression or parameter.

Aggregate

avg

Compute the average (mean) value of the given column.

def avg(
    col: TransformArg | None = None,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg | None

Column to compute the mean for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

count

A count aggregate transform.

def count(
    col: TransformArg | None = None,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg | None

Compute the count of records in an aggregation group. If specified, only non-null expression values are counted. If omitted, all rows within a group are counted.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

sum

Compute the sum of the given column.

def sum(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the sum for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

min

Compute the minimum value of the given column.

def min(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the minimum for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

max

Compute the maximum value of the given column.

def max(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the maximum for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

median

Compute the median value of the given column.

def median(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the median for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

mode

Compute the mode value of the given column.

def mode(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the mode for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

first

Return the first column value found in an aggregation group.

def first(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to get the first value from.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

last

Return the last column value found in an aggregation group.

def last(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to get the last value from.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

product

Compute the product of the given column.

def product(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the product for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

quantile

Compute the quantile value of the given column at the provided probability threshold.

def quantile(
    col: TransformArg,
    threshold: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the quantile for.

threshold TransformArg

Probability threshold (e.g., 0.5 for median).

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

stddev

Compute the standard deviation of the given column.

def stddev(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the standard deviation for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

stddev_pop

Compute the population standard deviation of the given column.

def stddev_pop(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the population standard deviation for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

variance

Compute the sample variance of the given column.

def variance(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the variance for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

var_pop

Compute the population variance of the given column.

def var_pop(
    col: TransformArg,
    distinct: bool | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to compute the population variance for.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

argmin

Find a value of the first column that minimizes the second column.

def argmin(
    col1: TransformArg,
    col2: TransformArg,
    distinct: bool | None,
    **options: Unpack[WindowOptions],
) -> Transform
col1 TransformArg

Column to yield the value from.

col2 TransformArg

Column to check for minimum corresponding value of col1.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

argmax

Find a value of the first column that maximizes the second column.

def argmax(
    col1: TransformArg,
    col2: TransformArg,
    distinct: bool | None,
    **options: Unpack[WindowOptions],
) -> Transform
col1 TransformArg

Column to yield the value from.

col2 TransformArg

Column to check for maximum corresponding value of col1.

distinct bool | None

Aggregate distinct.

**options Unpack[WindowOptions]

Window transform options.

Window

row_number

Compute the 1-based row number over an ordered window partition.

def row_number(**options: Unpack[WindowOptions]) -> Transform
**options Unpack[WindowOptions]

Window transform options.

rank

Compute the row rank over an ordered window partition.

Sorting ties result in gaps in the rank numbers ([1, 1, 3, …]).

def rank(**options: Unpack[WindowOptions]) -> Transform
**options Unpack[WindowOptions]

Window transform options.

dense_rank

Compute the dense row rank (no gaps) over an ordered window partition.

Sorting ties do not result in gaps in the rank numbers ( [1, 1, 2, …]).

def dense_rank(**options: Unpack[WindowOptions]) -> Transform
**options Unpack[WindowOptions]

Window transform options.

percent_rank

Compute the percetange rank over an ordered window partition.

def percent_rank(**options: Unpack[WindowOptions]) -> Transform
**options Unpack[WindowOptions]

Window transform options.

cume_dist

Compute the cumulative distribution value over an ordered window partition.

Equals the number of partition rows preceding or peer with the current row, divided by the total number of partition rows.

def cume_dist(**options: Unpack[WindowOptions]) -> Transform
**options Unpack[WindowOptions]

Window transform options.

n_tile

Compute an n-tile integer ranging from 1 to num_buckets dividing the partition as equally as possible.

def n_tile(num_buckets: int, **options: Unpack[WindowOptions]) -> Transform
num_buckets int

Number of buckets.

**options Unpack[WindowOptions]

Window transform options.

lag

Compute lagging values in a column.

Returns the value at the row that is at offset rows (default 1) before the current row within the window frame.

def lag(
    col: TransformArg,
    offset: int = 1,
    default: TransformArg | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to take value from.

offset int

Rows to offset.

default TransformArg | None

Default value if thre is no such row.

**options Unpack[WindowOptions]

Window transform options.

lead

Compute leading values in a column.

Returns the value at the row that is at offset rows (default 1) after the current row within the window frame.

def lead(
    col: TransformArg,
    offset: int = 1,
    default: TransformArg | None = None,
    **options: Unpack[WindowOptions],
) -> Transform
col TransformArg

Column to take value from.

offset int

Rows to offset.

default TransformArg | None

Default value if thre is no such row.

**options Unpack[WindowOptions]

Window transform options.

first_value

Get the first value of the given column in the current window frame.

def first_value(col: TransformArg, **options: Unpack[WindowOptions]) -> Transform
col TransformArg

Aggregate column to take first value from.

**options Unpack[WindowOptions]

Window transform options.

last_value

Get the last value of the given column in the current window frame.

def last_value(col: TransformArg, **options: Unpack[WindowOptions]) -> Transform
col TransformArg

Aggregate column to take last value from.

**options Unpack[WindowOptions]

Window transform options.

nth_value

Get the nth value of the given column in the current window frame, counting from one.

def nth_value(
    col: TransformArg, offset: int, **options: Unpack[WindowOptions]
) -> Transform
col TransformArg

Aggregate column to take nth value from.

offset int

Offset for the nth row.

**options Unpack[WindowOptions]

Window transform options.

Types

Transform

Column transformation operation.

class Transform(dict[str, JsonValue])

WindowOptions

Window transform options.

class WindowOptions(TypedDict, total=False)

Attributes

orderby str | Param | Sequence[str | Param]

One or more expressions by which to sort a windowed version of this aggregate function.

partitionby str | Param | Sequence[str | Param]

One or more expressions by which to partition a windowed version of this aggregate function.

rows Sequence[float | None] | Param

window rows frame specification as an array or array-valued expression.

range Sequence[float | None] | Param

Window range frame specification as an array or array-valued expression.