aggregate_num

aggregate_num(d, column, *, fn='sum', taxid_col='taxid')

Numerical variable aggregation along branches.

Aggregates a numerical variable in a DataFrame with taxonomy ids along the branches of the lifemap tree.

Parameters

Name Type Description Default
d pd.DataFrame | pl.DataFrame DataFrame to aggregate data from. required
column str Name of the d column to aggregate. required
fn (sum, mean, min, max, median) Function used to aggregate the values, by default “sum”. "sum"
taxid_col str Name of the d column containing taxonomy ids, by default “taxid” 'taxid'

Returns

Type Description
pl.DataFrame Aggregated DataFrame.

Raises

Type Description
ValueError If column is equal to “taxid”.
ValueError If fn is not on the allowed values.

See Also

aggregate_count : aggregation of the number of observations.

aggregate_freq : aggregation of the values counts of a categorical variable.

Examples

>>> from pylifemap import aggregate_num
>>> import polars as pl
>>> d = pl.DataFrame({
...     "taxid": [33154, 33090, 2],
...     "value": [10, 5, 100]
... })
>>> aggregate_num(d, column="value", fn="sum")
shape: (5, 2)
┌───────┬───────┐
│ taxid ┆ value │
------
│ i32   ┆ i64   │
╞═══════╪═══════╡
0115
2100
275915
330905
3315410
└───────┴───────┘