aggregate_num
*, fn='sum', taxid_col='taxid') aggregate_num(d, column,
Numerical variable aggregation along branches.
Aggregates a numerical variable in a DataFrame with taxonomy ids along the branches of the lifemap tree.
Parameters
Name | Type | Description | Default |
---|---|---|---|
d | pd .DataFrame | pl .DataFrame |
DataFrame to aggregate data from. | required |
column | str | Name of the d column to aggregate. |
required |
fn | (sum, mean , min, max, median ) |
Function used to aggregate the values, by default “sum”. | "sum" |
taxid_col | str | Name of the d column containing taxonomy ids, by default “taxid” |
'taxid' |
Returns
Name | Type | Description |
---|---|---|
pl .DataFrame |
Aggregated DataFrame. |
Raises
Name | Type | Description |
---|---|---|
ValueError | If column is equal to “taxid”. |
|
ValueError | If fn is not on the allowed values. |
See also
aggregate_count
: aggregation of the number of observations.
aggregate_freq
: aggregation of the values counts of a categorical variable.
Examples
>>> from pylifemap import aggregate_num
>>> import polars as pl
>>> d = pl.DataFrame({
"taxid": [33154, 33090, 2],
... "value": [10, 5, 100]
...
... })>>> aggregate_num(d, column="value", fn="sum")
5, 2)
shape: (
┌───────┬───────┐
│ taxid ┆ value │--- ┆ --- │
│
│ i32 ┆ i64 │
╞═══════╪═══════╡0 ┆ 115 │
│ 2 ┆ 100 │
│ 2759 ┆ 15 │
│ 33090 ┆ 5 │
│ 33154 ┆ 10 │
│ └───────┴───────┘