import polars as pl
= pl.read_csv(
iucn "https://raw.githubusercontent.com/Lifemap-ToL/pylifemap/main/data/iucn.csv"
)
Getting started
To create a Lifemap data visualization, you will have to follow these steps:
- Prepare and load your data,
- If needed, aggregate you data with an aggregation function
- Initialize a Lifemap object
- Add visualization layers
- show() or save() the result
Prepare your data
The data you want to visualize on the Lifemap tree of life must be in a pandas or polars DataFrame. They must contain observations (species) as rows, and variables as columns, and one column must contain the NCBI taxonomy identifier of the species.
pylifemap
includes an example dataset generated from The IUCN Red List of Threatened Species. It is a CSV file with the Red List category (in 2022) of more than 84000 species.
We can import it as a polars or pandas DataFrame with the following code:
import pandas as pd
= pd.read_csv(
iucn "https://raw.githubusercontent.com/Lifemap-ToL/pylifemap/main/data/iucn.csv"
)
The resulting table only has two columns: taxid
, which contains the species identifiers, and status
, with the Red List category of each species.
iucn
taxid | status |
---|---|
i64 | str |
651506 | "Data Deficient" |
2803960 | "Critically Endangered" |
143610 | "Critically Endangered" |
2760993 | "Least Concern" |
72259 | "Least Concern" |
… | … |
337230 | "Least Concern" |
442623 | "Vulnerable" |
2303643 | "Critically Endangered" |
442625 | "Critically Endangered" |
442626 | "Least Concern" |
Initialize a Lifemap
object
The next step is to create a new Lifemap object. To do this we have to pass it our DataFrame, as well as the name of the column with our taxonomy identifiers1:
from pylifemap import Lifemap
="taxid") Lifemap(iucn, taxid_col
/home/runner/work/pylifemap/pylifemap/src/pylifemap/data.py:98: UserWarning: 534 taxids have not been found in Lifemap database
warnings.warn(msg, stacklevel=0)
/home/runner/work/pylifemap/pylifemap/src/pylifemap/data.py:110: UserWarning: 152 duplicated taxids have been found in the data
warnings.warn(msg, stacklevel=0)
<LifemapWidget>
We could have passed other arguments such as the width
and height
of our visualization, either as a number of pixels or as a CSS units.
For example, the following intialization would make the visualization take the full available width, and an height of 800 pixels.
="taxid", width="100%", height=800) Lifemap(iucn, taxid_col
Add visualization layers
After initializing our Lifemap
object, we have to add visualization layers to create graphical representations. There are several different layers available:
Layer | Description |
---|---|
layer_points | Displays each observation with a point. Radius and color can be dependent of an attribute in the DataFrame. |
layer_lines | Using aggregated data, highlights branches of the tree with lines of varying width and color. |
layer_donuts | Displays aggregated categorical data as donut charts. |
layer_heatmap | Displays a heatmap of the observations distribution in the tree. |
layer_screengrid | Displays the observations distribution with a colored grid with fixed-size cells.. |
To add a layer, we just have to call the corresponding layer_
method of our Lifemap
object. For example, to add a points layer:
="taxid").layer_points() Lifemap(iucn, taxid_col
We can add several layers by calling several methods. For example we could display a heatmap layer, and a points layer above it:
="taxid").layer_heatmap().layer_points() Lifemap(iucn, taxid_col
Show or save the visualization
Just adding layers is not sufficient to see our visualization. For it to appear, we have to call the show() method:
="taxid").layer_points().show() Lifemap(iucn, taxid_col
When in a notebook environment, calling show() will display the visualisation as a widget. When called from a Python script or a textual Python REPL, the visualization will be saved to a temporary file and, if possible, displayed in the user’s browser. When called from a Python script running inside our Docker container, it will be saved to a file in the working directory.
We can also save it to an HTML file which can be opened later in a browser by using the save() method:
="taxid").layer_points().save("lifemap.html") Lifemap(iucn, taxid_col
Customize the layers
Each layer accepts a certain number of arguments to customize its appearance. For example we can change the radius and opacity of our points and make their color depend on their status
value:
(="taxid")
Lifemap(iucn, taxid_col="status", radius=3, opacity=0.5)
.layer_points(fill_col
.show() )
Aggregate data
pylifemap
provides several aggregation functions that allow to aggregate data along the branches of the tree:
Function | Description |
---|---|
aggregate_count | Aggregates the number of children of each tree node. |
aggregate_num | Aggregates a numerical variable along the tree branches with a given function (sum , mean, max…). |
aggregate_freq | Aggregates the frequencies of the levels of a categorical variable. |
For example, we could filter out in our data set the species which have an “extinct” status:
= iucn.filter(pl.col("status") == "Extinct") iucn_extinct
= iucn[iucn["status"] == "Extinct"] iucn_extinct
We can then aggregate their count along the branches with aggregate_count:
from pylifemap import aggregate_count
= aggregate_count(iucn_extinct)
iucn_extinct_agg iucn_extinct_agg
taxid | n |
---|---|
i32 | u32 |
null | 2 |
0 | 196 |
2759 | 196 |
3193 | 21 |
3268 | 1 |
… | … |
3073809 | 50 |
3073812 | 1 |
3076244 | 1 |
3078114 | 76 |
3136023 | 1 |
Finally, we can represent this new dataset with a lines layer.
(
Lifemap(iucn_extinct_agg)="n", width_col="n", label="Extinct species")
.layer_lines(color_col
.show() )
/home/runner/work/pylifemap/pylifemap/src/pylifemap/data.py:98: UserWarning: 3 taxids have not been found in Lifemap database: [None, 1914395, 3041918]
warnings.warn(msg, stacklevel=0)
Footnotes
if your column is named “taxid” you can omit the
taxid_col
argument as it is its default value.↩︎