Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making simple phylogenetic dendrogram (tree) from a list of species

I want to make a simple phylogenetic tree for a marine biology course as an educative example. I have a list of species with taxonomic rank:

    Group <- c("Benthos","Benthos","Benthos","Benthos","Benthos","Benthos","Zooplankton","Zooplankton","Zooplankton","Zooplankton",
"Zooplankton","Zooplankton","Fish","Fish","Fish","Fish","Fish","Fish","Phytoplankton","Phytoplankton","Phytoplankton","Phytoplankton")
Domain <- rep("Eukaryota", length(Group))
Kingdom <- c(rep("Animalia", 18), rep("Chromalveolata", 4))
Phylum <- c("Annelida","Annelida","Arthropoda","Arthropoda","Porifera","Sipunculida","Arthropoda","Arthropoda","Arthropoda",
"Arthropoda","Echinoidermata","Chorfata","Chordata","Chordata","Chordata","Chordata","Chordata","Chordata","Heterokontophyta",
"Heterokontophyta","Heterokontophyta","Dinoflagellata")
Class <- c("Polychaeta","Polychaeta","Malacostraca","Malacostraca","Demospongiae","NA","Malacostraca","Malacostraca",
"Malacostraca","Maxillopoda","Ophiuroidea","Actinopterygii","Chondrichthyes","Chondrichthyes","Chondrichthyes","Actinopterygii",
"Actinopterygii","Actinopterygii","Bacillariophyceae","Bacillariophyceae","Prymnesiophyceae","NA")
Order <- c("NA","NA","Amphipoda","Cumacea","NA","NA","Amphipoda","Decapoda","Euphausiacea","Calanioda","NA","Gadiformes",
"NA","NA","NA","NA","Gadiformes","Gadiformes","NA","NA","NA","NA")                     
Species <- c("Nephtys sp.","Nereis sp.","Gammarus sp.","Diastylis sp.","Axinella sp.","Ph. Sipunculida","Themisto abyssorum","Decapod larvae (Zoea)",
"Thysanoessa sp.","Centropages typicus","Ophiuroidea larvae","Gadus morhua eggs / larvae","Etmopterus spinax","Amblyraja radiata",
"Chimaera monstrosa","Clupea harengus","Melanogrammus aeglefinus","Gadus morhua","Thalassiosira sp.","Cylindrotheca closterium",
"Phaeocystis pouchetii","Ph. Dinoflagellata")   
dat <- data.frame(Group, Domain, Kingdom, Phylum, Class, Order, Species)
dat

I would like to get a dendrogram (cluster analysis) and use Domain as the first cutting point, Kindom as the second, Phylum as the third, etc. Missing values should be ignored (no cutting point, a straight line instead). Group should be used as a coloring category for the labels.

I am a bit uncertain how to make a distance matrix from this data frame. There are a lot of phylogenetic tree packages for R, they seem to want newick data / DNA / other advanced information. Thus help with this would be appreciated.

like image 875
Mikko Avatar asked Mar 28 '12 09:03

Mikko


2 Answers

It's probably a bit lame to answer my own question, but I found an easier solution. Maybe it helps someone one day.

library(ape)
taxa <- as.phylo(~Kingdom/Phylum/Class/Order/Species, data = dat)

col.grp <- merge(data.frame(Species = taxa$tip.label), dat[c("Species", "Group")], by = "Species", sort = F)

cols <- ifelse(col.grp$Group == "Benthos", "burlywood4", ifelse(col.grp$Group == "Zooplankton", "blueviolet", ifelse(col.grp$Group == "Fish", "dodgerblue", ifelse(col.grp$Group == "Phytoplankton", "darkolivegreen2", ""))))

plot(taxa, type = "cladogram", tip.col = cols)

Note that all columns have to be factors. This demonstrates the work flow with R. It takes a week to find out something, although the code itself is just a couple of rows =)

enter image description here

like image 147
Mikko Avatar answered Sep 20 '22 13:09

Mikko


If you wanted to draw the tree by hand (this is probably not the best way to do it), you could start as follows (it is not a complete answer: the colours are missing, and the edges are too long). This assumes that the data has already been sorted.

# Data: remove Group
dat <- data.frame(Domain, Kingdom, Phylum, Class, Order, Species)

# Start a new plot
par(mar=c(0,0,0,0))
plot(NA, xlim=c(0,ncol(dat)+1), ylim=c(0,nrow(dat)+1), 
  type="n", axes=FALSE, xlab="", ylab="", main="")

# Compute the position of each node and find all the edges to draw
positions <- NULL
links <- NULL
for(k in 1:ncol(dat)) {
  y <- tapply(1:nrow(dat), dat[,k], mean)
  y <- y[ names(y) != "NA" ]
  positions <- rbind( positions, data.frame(
    name = names(y),
    x = k,
    y = y
  ))
}
links <- apply( dat, 1, function(u) { 
  u <- u[ !is.na(u) & u != "NA" ]
  cbind(u[-length(u)],u[-1]) 
} )
links <- do.call(rbind, links)
rownames(links) <- NULL
links <- unique(links[ order(links[,1], links[,2]), ])

# Draw the edges
for(i in 1:nrow(links)) {
  from <- positions[links[i,1],]
  to   <- positions[links[i,2],]
  lines( c(from$x, from$x, to$x), c(from$y, to$y, to$y) )
}

# Add the text
text(positions$x, positions$y, label=positions$name)
like image 44
Vincent Zoonekynd Avatar answered Sep 19 '22 13:09

Vincent Zoonekynd