Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constructing a phylogentic tree

I have a list of list of lists like this

matches = [[['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', 'Firmicutes'], ['class', 'Clostridia'], ['order', 'Clostridiales'], ['family', 'Lachnospiraceae'], ['genus', 'Lachnospira']], 
           [['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', '"Proteobacteria"'], ['class', 'Gammaproteobacteria'], ['order', '"Vibrionales"'], ['family', 'Vibrionaceae'], ['genus', 'Catenococcus']], 
           [['rootrank', 'Root'], ['domain', 'Archaea'], ['phylum', '"Euryarchaeota"'], ['class', '"Methanomicrobia"'], ['order', 'Methanomicrobiales'], ['family', 'Methanomicrobiaceae'], ['genus', 'Methanoplanus']]]

And I want to construct a phylogenetic tree from them. I wrote a node class like so (based partially on this code):

class Node(object):
    """Generic n-ary tree node object
    Children are additive; no provision for deleting them."""

    def __init__(self, parent, category=None, name=None):
        self.parent = parent
        self.category = category
        self.name = name
        self.childList = []

        if  parent is None:
            self.birthOrder  =  0
        else:
            self.birthOrder  =  len(parent.childList)
            parent.childList.append(self)

    def fullPath(self):
        """Returns a list of children from root to self"""
        result  =  []
        parent  =  self.parent
        kid     =  self

        while parent:
            result.insert(0, kid)
            parent, kid  =  parent.parent, parent

        return result

    def ID(self):
        return '{0}|{1}'.format(self.category, self.name)

And then I try to construct my tree like this:

node = None
for match in matches:
    for branch in match:
        category, name = branch
        node = Node(node, category, name)
        print [n.ID() for n in node.fullPath()]

This works for the first match, but when I start with the second match it is appended at the end of the tree instead of starting again at the top. How would I do that? I tried some variations on searching for the ID, but I can't get it to work.

like image 667
BioGeek Avatar asked Aug 08 '13 14:08

BioGeek


1 Answers

I would highly recommend using a phylogenetics library like Dendropy.

The 'standard way of writing phylogenetic trees is with the Newick format (parenthetical statements like ((A,B),C)). If you use Dendropy, reading that tree would be as simple as

>>> import dendropy
>>> tree1 = dendropy.Tree.get_from_string("((A,B),(C,D))", schema="newick")

or to read from a stream

>>> tree1 = dendropy.Tree(stream=open("mle.tre"), schema="newick")

The creator of the library maintains a nice tutorial too.

like image 148
dudu Avatar answered Oct 12 '22 11:10

dudu