aggregate_freq
aggregate_freq(d, column, *, taxid_col='taxid')Categorical variable frequencies aggregation along branches.
Aggregates a categorical variable in a DataFrame with taxonomy ids as levels frequencies along the branches of the lifemap tree.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| d | pd.DataFrame | pl.DataFrame |
DataFrame to aggregate data from. | required |
| column | str | Name of the d column to aggregate. |
required |
| taxid_col | str | Name of the d column containing taxonomy ids, by default “taxid”. |
'taxid' |
Returns
| Name | Type | Description |
|---|---|---|
pl.DataFrame |
Aggregated DataFrame. The “count” column contains the value counts as a polars struct. |
See also
aggregate_num : aggregation of a numeric variable.
aggregate_count : aggregation of the number of observations.
Examples
>>> from pylifemap import aggregate_freq
>>> import polars as pl
>>> d = pl.DataFrame({
... "taxid": [33154, 33090, 2],
... "value": ["a", "b", "a"]
... })
>>> aggregate_freq(d, column="value")
shape: (7, 3)
┌───────┬───────┬───────┐
│ taxid ┆ value ┆ count │
│ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ u32 │
╞═══════╪═══════╪═══════╡
│ 0 ┆ a ┆ 2 │
│ 0 ┆ b ┆ 1 │
│ 2 ┆ a ┆ 1 │
│ 2759 ┆ a ┆ 1 │
│ 2759 ┆ b ┆ 1 │
│ 33090 ┆ b ┆ 1 │
│ 33154 ┆ a ┆ 1 │
└───────┴───────┴───────┘