aggregate_num
aggregate_num(d, column, *, fn='sum', taxid_col='taxid')Numerical variable aggregation along branches.
Aggregates a numerical variable in a DataFrame with taxonomy ids along the branches of the lifemap tree.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| d | pd.DataFrame | pl.DataFrame |
DataFrame to aggregate data from. | required |
| column | str | Name of the d column to aggregate. |
required |
| fn | (sum, mean, min, max, median) |
Function used to aggregate the values, by default “sum”. | "sum" |
| taxid_col | str | Name of the d column containing taxonomy ids, by default “taxid” |
'taxid' |
Returns
| Name | Type | Description |
|---|---|---|
pl.DataFrame |
Aggregated DataFrame. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If column is equal to “taxid”. |
|
| ValueError | If fn is not on the allowed values. |
See also
aggregate_count : aggregation of the number of observations.
aggregate_freq : aggregation of the values counts of a categorical variable.
Examples
>>> from pylifemap import aggregate_num
>>> import polars as pl
>>> d = pl.DataFrame({
... "taxid": [33154, 33090, 2],
... "value": [10, 5, 100]
... })
>>> aggregate_num(d, column="value", fn="sum")
shape: (5, 2)
┌───────┬───────┐
│ taxid ┆ value │
│ --- ┆ --- │
│ i32 ┆ i64 │
╞═══════╪═══════╡
│ 0 ┆ 115 │
│ 2 ┆ 100 │
│ 2759 ┆ 15 │
│ 33090 ┆ 5 │
│ 33154 ┆ 10 │
└───────┴───────┘