Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to collapse branches in a phylogenetic tree by the label in their nodes or leaves?

I have built a phylogenetic tree for a protein family that can be split into different groups, classifying each one by its type of receptor or type of response. The nodes in the tree are labeled as the type of receptor.

In the phylogenetic tree I can see that proteins that belong to the same groups or type of receptor have clustered together in the same branches. So I would like to collapse these branches that have labels in common, grouping them by a given list of keywords.

The command would be something like this:

./collapse_tree_by_label -f phylogenetic_tree.newick -l list_of_labels_to_collapse.txt -o collapsed_tree.eps(or pdf)

My list_of_labels_to_collapse.txt would be like this: A B C D

My newick tree would be like this: (A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2)

The output image without collapsing is like this: http://i.stack.imgur.com/pHkoQ.png

The output image collapsing should be like this (collapsed_tree.eps): http://i.stack.imgur.com/TLXd0.png

The width of the triangles should represent the branch length, and the high of the triangles must represent the number of nodes in the branch.

I have been playing with the "ape" package in R. I was able to plot a phylogenetic tree, but I still can't figure out how to collapse the branches by keywords in the labels:

require("ape")

This will load the tree:

cat("((A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")

Here should be the code to collapse

This will plot the tree:

plot(tree.test)
like image 386
RDlady Avatar asked Dec 21 '15 20:12

RDlady


2 Answers

Your tree as it is stored in R already has the tips stored as polytomies. It's just a matter of plotting the tree with triangles representing the polytomies.

There is no function in ape to do this, that I am aware of, but if you mess with the plotting function a little bit you can pull it off

# Step 1: make edges for descendent nodes invisible in plot:
groups <- c("A", "B", "C", "D")
group_edges <- numeric(0)
for(group in groups){
  group_edges <- c(group_edges,getMRCA(tree.test,tree.test$tip.label[grepl(group, tree.test$tip.label)]))
}
edge.width <- rep(1, nrow(tree.test$edge))
edge.width[tree.test$edge[,1] %in% group_edges ] <- 0


# Step 2: plot the tree with the hidden edges
plot(tree.test, show.tip.label = F, edge.width = edge.width)

# Step 3: add triangles
add_polytomy_triangle <- function(phy, group){
  root <- length(phy$tip.label)+1
  group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
  group_nodes <- which(phy$tip.label %in% group_node_labels)
  group_mrca <- getMRCA(phy,group_nodes)

  tip_coord1 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[1])
  tip_coord2 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[length(group_nodes)])
  node_coord <- c(dist.nodes(phy)[root, group_mrca], mean(c(tip_coord1[2], tip_coord2[2])))

  xcoords <- c(tip_coord1[1], tip_coord2[1], node_coord[1])
  ycoords <- c(tip_coord1[2], tip_coord2[2], node_coord[2])
  polygon(xcoords, ycoords)
}

Then you just have to loop through the groups to add the triangles

for(group in groups){
  add_polytomy_triangle(tree.test, group)
}
like image 173
C_Z_ Avatar answered Sep 30 '22 06:09

C_Z_


I've also been searching for this kind of tool for ages, not so much for collapsing categorical groups, but for collapsing internal nodes based on a numerical support value.

The di2multi function in the ape package can collapse nodes to polytomies, but it currently can only does this by branch length threshold. Here is a rough adaptation that allows collapsing by a node support value threshold instead (default threshold = 0.5).

Use at your own risk, but it works for me on my rooted Bayesian tree.

di2multi4node <- function (phy, tol = 0.5) 
  # Adapted di2multi function from the ape package to plot polytomies
  # based on numeric node support values
  # (di2multi does this based on edge lengths)
  # Needs adjustment for unrooted trees as currently skips the first edge
{
  if (is.null(phy$edge.length)) 
    stop("the tree has no branch length")
  if (is.na(as.numeric(phy$node.label[2])))
    stop("node labels can't be converted to numeric values")
  if (is.null(phy$node.label))
    stop("the tree has no node labels")
  ind <- which(phy$edge[, 2] > length(phy$tip.label))[as.numeric(phy$node.label[2:length(phy$node.label)]) < tol]
  n <- length(ind)
  if (!n) 
    return(phy)
  foo <- function(ancestor, des2del) {
    wh <- which(phy$edge[, 1] == des2del)
    for (k in wh) {
      if (phy$edge[k, 2] %in% node2del) 
        foo(ancestor, phy$edge[k, 2])
      else phy$edge[k, 1] <<- ancestor
    }
  }
  node2del <- phy$edge[ind, 2]
  anc <- phy$edge[ind, 1]
  for (i in 1:n) {
    if (anc[i] %in% node2del) 
      next
    foo(anc[i], node2del[i])
  }
  phy$edge <- phy$edge[-ind, ]
  phy$edge.length <- phy$edge.length[-ind]
  phy$Nnode <- phy$Nnode - n
  sel <- phy$edge > min(node2del)
  for (i in which(sel)) phy$edge[i] <- phy$edge[i] - sum(node2del < 
                                                           phy$edge[i])
  if (!is.null(phy$node.label)) 
    phy$node.label <- phy$node.label[-(node2del - length(phy$tip.label))]
  phy
}
like image 24
antechinus Avatar answered Sep 30 '22 06:09

antechinus