I have a list of taxids that looks like this:
1204725
2162
1300163
420247
I am looking to get a file with taxonomic ids in order from the taxids above:
kingdom_id phylum_id class_id order_id family_id genus_id species_id
I am using the package "ete3". I use the tool ete-ncbiquery that tells you the lineage from the ids above. (I run it from my linux laptop with the command below)
ete3 ncbiquery --search 1204725 2162 13000163 420247 --info
The result looks like this:
# Taxid Sci.Name Rank Named Lineage Taxid Lineage
2162 Methanobacterium formicicum species root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum 1,131567,2157,28890,183925,2158,2159,2160,2162
1204725 Methanobacterium formicicum DSM 3637 no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum,Methanobacterium formicicum DSM 3637 1,131567,2157,28890,183925,2158,2159,2160,2162,1204725
420247 Methanobrevibacter smithii ATCC 35061 no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter,Methanobrevibacter smithii,Methanobrevibacter smithii ATCC 350611,131567,2157,28890,183925,2158,2159,2172,2173,420247
I have no idea which items (IDS) correspond to what I am looking for (if any)
A taxonomy code is a unique 10-character code that designates your classification and specialization. You will use this code when applying for a National Provider Identifier, commonly referred to as an NPI.
There are seven major taxonomic classifications: Kingdom, phylum, class, order, family, genus, and species.
Classification, or taxonomy, is a system of categorizing living things. There are seven divisions in the system: (1) Kingdom; (2) Phylum or Division; (3) Class; (4) Order; (5) Family; (6) Genus; (7) Species. Kingdom is the broadest division.
Following the domain level, the classification system reads from least specific to most specific in the following order: Kingdom, Phylum, Class, Order, Family, Genus, and Species. A mnemonic device often used to remember this order is King Philip Can Only Find Green Socks.
The following code:
import csv
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
lineage2ranks = ncbi.get_rank(lineage)
ranks2lineage = dict((rank, taxid) for (taxid, rank) in lineage2ranks.items())
return {'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}
def main(taxids, desired_ranks, path):
with open(path, 'w') as csvfile:
fieldnames = ['{}_id'.format(rank) for rank in desired_ranks]
writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=fieldnames)
writer.writeheader()
for taxid in taxids:
writer.writerow(get_desired_ranks(taxid, desired_ranks))
if __name__ == '__main__':
taxids = [1204725, 2162, 1300163, 420247]
desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
path = 'taxids.csv'
main(taxids, desired_ranks, path)
Produces a file that looks like this:
kingdom_id phylum_id class_id order_id family_id genus_id species_id
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2172 2173
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With