inspect_viz.transform
SQL
sql
SQL transform for a column.
def sql(sql: str, label: str | None = None) -> Transformsqlstr-
A SQL expression string to derive a new column value. Embedded Param references, such as
f"{param} + 1", are supported. For expressions with aggregate functions, use agg() instead. labelstr | None-
A label for this expression, for example to label a plot axis.
agg
Aggregation transform for a column.
def agg(agg: str, label: str | None = None) -> Transformaggstr-
A SQL expression string to calculate an aggregate value. Embedded Param references, such as
f"SUM({param} + 1)", are supported. For expressions without aggregate functions, use sql() instead.” labelstr | None-
A label for this expression, for example to label a plot axis.
Column
column
Intpret a string or param-value as a column reference.
def column(column: str | Param) -> Transformcolumnstr | Param-
Column name or paramameter.
bin
Bin a continuous variable into discrete intervals.
def bin(
bin: str | float | bool | Param | Sequence[str | float | bool | Param],
interval: Literal[
"date",
"number",
"millisecond",
"second",
"minute",
"hour",
"day",
"month",
"year",
]
| None = None,
step: float | None = None,
steps: float | None = None,
minstep: float | None = None,
nice: bool | None = None,
offset: float | None = None,
) -> Transformbinstr | float | bool | Param | Sequence[str | float | bool | Param]-
specifies a data column or expression to bin. Both numerical and temporal (date/time) values are supported.
intervalLiteral['date', 'number', 'millisecond', 'second', 'minute', 'hour', 'day', 'month', 'year'] | None-
The interval bin unit to use, typically used to indicate a date/time unit for binning temporal values, such as
hour,day, ormonth. Ifdate, the extent of data values is used to automatically select an interval for temporal data. The valuenumberenforces normal numerical binning, even over temporal data. If unspecified, defaults tonumberfor numerical data anddatefor temporal data. stepfloat | None-
The step size to use between bins. When binning numerical values (or interval type
number), this setting specifies the numerical step size. For data/time intervals, this indicates the number of steps of that unit, such as hours, days, or years. stepsfloat | None-
The target number of binning steps to use. To accommodate human-friendly (“nice”) bin boundaries, the actual number of bins may diverge from this exact value. This option is ignored when step is specified.
minstepfloat | None-
The minimum allowed bin step size (default 0) when performing numerical binning. For example, a setting of 1 prevents step sizes less than 1. This option is ignored when step is specified.
nicebool | None-
A flag (default true) requesting “nice” human-friendly end points and step sizes when performing numerical binning. When step is specified, this option affects the binning end points (e.g., origin) only.
offsetfloat | None-
Offset for computed bins (default 0). For example, a value of 1 will result in using the next consecutive bin boundary.
date_day
Transform a Date value to a day of the month for cyclic comparison.
Year and month values are collapsed to enable comparison over days only.
def date_day(expr: str | Param) -> Transformexprstr | Param-
Expression or parameter.
date_month
Transform a Date value to a month boundary for cyclic comparison.
Year values are collapsed to enable comparison over months only.
def date_month(expr: str | Param) -> Transformexprstr | Param-
Expression or parameter.
date_month_day
Map date/times to a month and day value, all within the same year for comparison.
The resulting value is still date-typed.
def date_month_day(expr: str | Param) -> Transformexprstr | Param-
Expression or parameter.
epoch_ms
Transform a Date value to epoch milliseconds.
def epoch_ms(expr: str | Param) -> Transformexprstr | Param-
Expression or parameter.
Aggregate
avg
Compute the average (mean) value of the given column.
def avg(
col: TransformArg | None = None,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg | None-
Column to compute the mean for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
count
A count aggregate transform.
def count(
col: TransformArg | None = None,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg | None-
Compute the count of records in an aggregation group. If specified, only non-null expression values are counted. If omitted, all rows within a group are counted.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
sum
Compute the sum of the given column.
def sum(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the sum for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
min
Compute the minimum value of the given column.
def min(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the minimum for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
max
Compute the maximum value of the given column.
def max(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the maximum for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
median
Compute the median value of the given column.
def median(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the median for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
mode
Compute the mode value of the given column.
def mode(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the mode for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
first
Return the first column value found in an aggregation group.
def first(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to get the first value from.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
last
Return the last column value found in an aggregation group.
def last(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to get the last value from.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
product
Compute the product of the given column.
def product(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the product for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
quantile
Compute the quantile value of the given column at the provided probability threshold.
def quantile(
col: TransformArg,
threshold: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the quantile for.
thresholdTransformArg-
Probability threshold (e.g., 0.5 for median).
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
stddev
Compute the standard deviation of the given column.
def stddev(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the standard deviation for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
stddev_pop
Compute the population standard deviation of the given column.
def stddev_pop(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the population standard deviation for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
variance
Compute the sample variance of the given column.
def variance(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the variance for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
var_pop
Compute the population variance of the given column.
def var_pop(
col: TransformArg,
distinct: bool | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to compute the population variance for.
distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
argmin
Find a value of the first column that minimizes the second column.
def argmin(
col1: TransformArg,
col2: TransformArg,
distinct: bool | None,
**options: Unpack[WindowOptions],
) -> Transformcol1TransformArg-
Column to yield the value from.
col2TransformArg-
Column to check for minimum corresponding value of
col1. distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
argmax
Find a value of the first column that maximizes the second column.
def argmax(
col1: TransformArg,
col2: TransformArg,
distinct: bool | None,
**options: Unpack[WindowOptions],
) -> Transformcol1TransformArg-
Column to yield the value from.
col2TransformArg-
Column to check for maximum corresponding value of
col1. distinctbool | None-
Aggregate distinct.
**optionsUnpack[WindowOptions]-
Window transform options.
ci_bounds
Compute a confidence interval boundary.
Returns a tuple of two Transform objects corresponding to the lower and upper bounds of the confidence interval.
Specify the confidence interval either as:
- A
levelandstderrcolumn (where a z-score for level will be offset from thestderr); or - Explicit
loweranduppercolumns which should already be on the desired scale (e.g., z*stderr, bootstrap deltas, HDIs from bayesian posterior distributions, etc.).
def ci_bounds(
score: str | Param,
*,
level: float | None = None,
stderr: str | Param | None = None,
lower: str | Param | None = None,
upper: str | Param | None = None,
) -> tuple[Transform, Transform]Window
row_number
Compute the 1-based row number over an ordered window partition.
def row_number(**options: Unpack[WindowOptions]) -> Transform**optionsUnpack[WindowOptions]-
Window transform options.
rank
Compute the row rank over an ordered window partition.
Sorting ties result in gaps in the rank numbers ([1, 1, 3, …]).
def rank(**options: Unpack[WindowOptions]) -> Transform**optionsUnpack[WindowOptions]-
Window transform options.
dense_rank
Compute the dense row rank (no gaps) over an ordered window partition.
Sorting ties do not result in gaps in the rank numbers ( [1, 1, 2, …]).
def dense_rank(**options: Unpack[WindowOptions]) -> Transform**optionsUnpack[WindowOptions]-
Window transform options.
percent_rank
Compute the percetange rank over an ordered window partition.
def percent_rank(**options: Unpack[WindowOptions]) -> Transform**optionsUnpack[WindowOptions]-
Window transform options.
cume_dist
Compute the cumulative distribution value over an ordered window partition.
Equals the number of partition rows preceding or peer with the current row, divided by the total number of partition rows.
def cume_dist(**options: Unpack[WindowOptions]) -> Transform**optionsUnpack[WindowOptions]-
Window transform options.
n_tile
Compute an n-tile integer ranging from 1 to num_buckets dividing the partition as equally as possible.
def n_tile(num_buckets: int, **options: Unpack[WindowOptions]) -> Transformnum_bucketsint-
Number of buckets.
**optionsUnpack[WindowOptions]-
Window transform options.
lag
Compute lagging values in a column.
Returns the value at the row that is at offset rows (default 1) before the current row within the window frame.
def lag(
col: TransformArg,
offset: int = 1,
default: TransformArg | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to take value from.
offsetint-
Rows to offset.
defaultTransformArg | None-
Default value if thre is no such row.
**optionsUnpack[WindowOptions]-
Window transform options.
lead
Compute leading values in a column.
Returns the value at the row that is at offset rows (default 1) after the current row within the window frame.
def lead(
col: TransformArg,
offset: int = 1,
default: TransformArg | None = None,
**options: Unpack[WindowOptions],
) -> TransformcolTransformArg-
Column to take value from.
offsetint-
Rows to offset.
defaultTransformArg | None-
Default value if thre is no such row.
**optionsUnpack[WindowOptions]-
Window transform options.
first_value
Get the first value of the given column in the current window frame.
def first_value(col: TransformArg, **options: Unpack[WindowOptions]) -> TransformcolTransformArg-
Aggregate column to take first value from.
**optionsUnpack[WindowOptions]-
Window transform options.
last_value
Get the last value of the given column in the current window frame.
def last_value(col: TransformArg, **options: Unpack[WindowOptions]) -> TransformcolTransformArg-
Aggregate column to take last value from.
**optionsUnpack[WindowOptions]-
Window transform options.
nth_value
Get the nth value of the given column in the current window frame, counting from one.
def nth_value(
col: TransformArg, offset: int, **options: Unpack[WindowOptions]
) -> TransformcolTransformArg-
Aggregate column to take nth value from.
offsetint-
Offset for the nth row.
**optionsUnpack[WindowOptions]-
Window transform options.
Types
Transform
Column transformation operation.
Transform: TypeAlias = dict[str, JsonValue]WindowOptions
Window transform options.
class WindowOptions(TypedDict, total=False)Attributes
orderbystr | Param | Sequence[str | Param]-
One or more expressions by which to sort a windowed version of this aggregate function.
partitionbystr | Param | Sequence[str | Param]-
One or more expressions by which to partition a windowed version of this aggregate function.
rowsSequence[float | None] | Param-
window rows frame specification as an array or array-valued expression.
rangeSequence[float | None] | Param-
Window range frame specification as an array or array-valued expression.