aggregate_freq

aggregate_freq(d, column, *, taxid_col='taxid')

Categorical variable frequencies aggregation along branches.

Aggregates a categorical variable in a DataFrame with taxonomy ids as levels frequencies along the branches of the lifemap tree.

Parameters

Name Type Description Default
d pd.DataFrame | pl.DataFrame DataFrame to aggregate data from. required
column str Name of the d column to aggregate. required
taxid_col str Name of the d column containing taxonomy ids, by default “taxid”. 'taxid'

Returns

Name Type Description
pl.DataFrame Aggregated DataFrame. The “count” column contains the value counts as a polars struct.

See also

aggregate_num : aggregation of a numeric variable.

aggregate_count : aggregation of the number of observations.

Examples

>>> from pylifemap import aggregate_freq
>>> import polars as pl
>>> d = pl.DataFrame({
...     "taxid": [33154, 33090, 2],
...     "value": ["a", "b", "a"]
... })
>>> aggregate_freq(d, column="value")
shape: (7, 3)
┌───────┬───────┬───────┐
│ taxid ┆ value ┆ count │
---------
│ i32   ┆ str   ┆ u32   │
╞═══════╪═══════╪═══════╡
0     ┆ a     ┆ 2
0     ┆ b     ┆ 1
2     ┆ a     ┆ 1
2759  ┆ a     ┆ 1
2759  ┆ b     ┆ 1
33090 ┆ b     ┆ 1
33154 ┆ a     ┆ 1
└───────┴───────┴───────┘