aggregate_freq
*, taxid_col='taxid') aggregate_freq(d, column,
Categorical variable frequencies aggregation along branches.
Aggregates a categorical variable in a DataFrame with taxonomy ids as levels frequencies along the branches of the lifemap tree.
Parameters
Name | Type | Description | Default |
---|---|---|---|
d | pd .DataFrame | pl .DataFrame |
DataFrame to aggregate data from. | required |
column | str | Name of the d column to aggregate. |
required |
taxid_col | str | Name of the d column containing taxonomy ids, by default “taxid”. |
'taxid' |
Returns
Name | Type | Description |
---|---|---|
pl .DataFrame |
Aggregated DataFrame. The “count” column contains the value counts as a polars struct. |
See also
aggregate_num
: aggregation of a numeric variable.
aggregate_count
: aggregation of the number of observations.
Examples
>>> from pylifemap import aggregate_freq
>>> import polars as pl
>>> d = pl.DataFrame({
"taxid": [33154, 33090, 2],
... "value": ["a", "b", "a"]
...
... })>>> aggregate_freq(d, column="value")
7, 3)
shape: (
┌───────┬───────┬───────┐
│ taxid ┆ value ┆ count │--- ┆ --- ┆ --- │
│ str ┆ u32 │
│ i32 ┆
╞═══════╪═══════╪═══════╡0 ┆ a ┆ 2 │
│ 0 ┆ b ┆ 1 │
│ 2 ┆ a ┆ 1 │
│ 2759 ┆ a ┆ 1 │
│ 2759 ┆ b ┆ 1 │
│ 33090 ┆ b ┆ 1 │
│ 33154 ┆ a ┆ 1 │
│ └───────┴───────┴───────┘