Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Phylo BioPython building trees

I trying to build a tree with BioPython, Phylo module.
What I've done so far is this image: alt text

each name has a four digit number followed by - and a number: this number refer to the number of times that sequence is represented. That means 1578 - 22, that node should represent 22sequences.

the file with the sequences aligned: file
the file with the distance to build a tree: file

So now I known how to change each size of the node. Each node has a different size, this is easy doing an array of the different values:

    fh = open(MEDIA_ROOT + "groupsnp.txt")    
    list_size = {}
    for line in fh:
        if '>' in line:
            labels = line.split('>')
            label = labels[-1]
            label = label.split()
            num = line.split('-')
            size = num[-1]
            size = size.split()
            for lab in label:
                for number in size:
                    list_size[lab] = int(number)

    a = array(list_size.values())

But the array is arbitrary, I would like to put the correct node size into the right node, I tried this:

         for elem in list_size.keys():
             if labels == elem:
                 Phylo.draw_graphviz(tree_xml, prog="neato", node_size=a)

but nothing appears when I use the if statement.

Anyway of doing this?

I would really appreciate!

Thanks everybody

like image 343
psoares Avatar asked Oct 29 '10 11:10

psoares


People also ask

How do you read phylo trees?

Each horizontal line in our tree represents a series of ancestors, leading up to the species at its end. For instance, the line leading up to species E represents the species' ancestors since it diverged from the other species in the tree.

What are Ultrametric trees?

Ultrametric trees are trees whose leaves lie at the same distance from the root. They are used to model the genealogy of a population of particles co-existing at the same point in time.

What is tree in bioinformatics?

A phylogenetic tree is a graphical representation of the evolutionary relationships between biological entities, usually sequences or species. Relationships between entities are captured by the topology (branching order) and amount of evolutionary change (branch lengths) between nodes.

What is Phylogenetics the study of?

Phylogenetics is the study of evolutionary relationships among biological entities – often species, individuals or genes (which may be referred to as taxa). The major elements of phylogenetics are summarised in Figure 1 below.


1 Answers

I finally got this working. The basic premise is that you're going to use the labels/nodelist to build your node_sizes. This way they correlate properly. I'm sure I'm missing some important options to make the tree look 100% but it appears the node sizes are showing up properly.

#basically a stripped down rewrite of Phylo.draw_graphviz
import networkx, pylab
from Bio import Phylo


#taken from draw_graphviz
def get_label_mapping(G, selection): 
    for node in G.nodes(): 
        if (selection is None) or (node in selection): 
            try: 
                label = str(node) 
                if label not in (None, node.__class__.__name__): 
                    yield (node, label) 
            except (LookupError, AttributeError, ValueError): 
                pass


kwargs={}
tree = Phylo.read('tree.dnd', 'newick')
G = Phylo.to_networkx(tree)
Gi = networkx.convert_node_labels_to_integers(G, discard_old_labels=False)

node_sizes = []
labels = dict(get_label_mapping(G, None))
kwargs['nodelist'] = labels.keys()

#create our node sizes based on our labels because the labels are used for the node_list
#this way they should be correct
for label in labels.keys():
    if str(label) != "Clade":
        num = label.name.split('-')
        #the times 50 is just a guess on what would look best
        size = int(num[-1]) * 50
        node_sizes.append(size)

kwargs['node_size'] = node_sizes
posi = networkx.pygraphviz_layout(Gi, 'neato', args='') 
posn = dict((n, posi[Gi.node_labels[n]]) for n in G) 

networkx.draw(G, posn, labels=labels, node_color='#c0deff', **kwargs)

pylab.show()

Resulting Tree alt text

like image 176
rwilliams Avatar answered Sep 19 '22 02:09

rwilliams