Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Walking a hierarchical tree

I want to be able to "walk" (iterate) through a hierarchical cluster (see figure below and code). What I want is:

  1. A function that that takes a matrix and a minimum height. Say 10 in this example.

    splitme <- function(matrix, minH){
        ##Some code
    }
    
  2. Starting from the top to minH, start cutting whenever there is a new split. This is the first problem. How to detect a new splits to get an height h.

  3. At this particular h, how many clusters are there? Retrieve clusters

    mycl <- cutree(hr, h=x);#x is that found h
    count <- count(mycl)# Bad code
    
  4. Save in variable(s) each of the new matrices. This is another hard one, dynamic creation of x new matrices. So perhaps a function that takes the clusters does what needs to be done (comparisons) and returns a variable ??

  5. Continue 3 and 4 until minH reached

Figure

enter image description here

Code

# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))

data <- cbind(desc.1, desc.2, desc.3)

# Create dendrogram
d <- dist(data) 
hc <- as.dendrogram(hclust(d))

# Function to color branches
colbranches <- function(n, col)
  {
  a <- attributes(n) # Find the attributes of current node
  # Color edges with requested color
  attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
  n # Don't forget to return the node!
  }

# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")

# Plot
plot(hc)
like image 301
StudentOfScience Avatar asked Nov 20 '13 03:11

StudentOfScience


1 Answers

I think what you need essentially is the cophenetic correlation coefficient of the dendrogram. It will tell you the heights of all splitting points. From there you can easily walk through the tree. I made an attempt below and store all submatrices to a list "submatrices". It's a nested list. The first level is the submatrices from all splitting points. The second level is the submatrices from a splitting point. For example, if you want all submatrices from the 1st splitting point (grey and blue clusters), it should be submatrices[[1]]. If you want the first submatrix (red cluster) from submatrices[[1]], it should be submatrices[[1]][1].

splitme <- function(data, minH){
  ##Compute dist matrix and clustering dendrogram
  d <- dist(data)
  cl <- hclust(d)
  hc <- as.dendrogram(cl)

  ##Get the cophenetic correlation coefficient matrix (cccm)
  cccm <- round(cophenetic(hc), digits = 0)

  #Get the heights of spliting points (sps)
  sps <- sort(unique(cccm), decreasing = T)

  #This list store all the submatrices
  #The submatrices extract from the nth splitting points
  #(top splitting point being the 1st whereas bottom splitting point being the last)
  submatrices <- list()

  #Iterate/Walk the dendrogram
  i <- 2 #Starting from 2 as the 1st value will give you the entire dendrogram as a whole
  while(sps[i] > minH){
    membership <- cutree(cl, h=sps[i]) #Cut the tree at splitting points
    lst <- list() #Create a list to store submatrices extract from a splitting point
    for(j in 1:max(membership)){
      member <- which(membership == j) #Get the corresponding data entry to create the submatrices
      df <- data.frame()
      for(p in member){
        df <- rbind(df, data[p, ])
        colnames(df) <- colnames(data)
        dm <- dist(df)
      }
      lst <- append(lst, list(dm)) #Append all submatrices from a splitting point to lst
    }
    submatrices <- append(submatrices, list(lst)) #Append the lst to submatrices list
    i <- i + 1
  }
  return(submatrices)
}
like image 81
jinlong Avatar answered Nov 04 '22 11:11

jinlong