Skip to contents

A dataset containing NCBI information of 1000 eukaryotes. The variables are as follows:

Usage

data(eukaryotes_1000)

Format

A data frame with 1000 rows and 19 variables:

X.Organism.Name

Organism name at the species level

taxid

NCBI taxid

BioProject.Accession

BioProject Accession number (from BioProject database)

BioProject.ID

BioProject ID

Group

Commonly used organism groups: Animals, Fungi, Plants, Protists

SubGroup

NCBI Taxonomy level below group: Mammals, Birds, Fishes, Flatworms, Insects, Amphibians, Reptiles, Roundworms, Ascomycetes, Basidiomycetes, Land Plants, Green Algae, Apicomplexans, Kinetoplasts

Size..Mb.

Total length of DNA submitted for the project

GC.

Percent of nitrogenous bases (guanine or cytosine) in DNA submitted for the project

Assembly.Accession

Name of the genome assembly (from NCBI Assembly database)

Replicons

Number of replicons in the assembly

WGS

Four-letter Accession prefix followed by version as defined in WGS division of GenBank/INSDC

Scaffolds

Number of scaffolds in the assembly

Genes

Number of Genes annotated in the assembly

Proteins

Number of Proteins annotated in the assembly

Release.Date

First public sequence release for the project

Modify.Date

Sequence modification date for the project

Status

Highest level of assembly:
Chromosomes: one or more chromosomes are assembled
Scaffolds or contigs: sequence assembled but no chromosomes

Center

Origin of the sample

BioSample.Accession

BioSample Accession number