Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange error of Hierarchical Clustering in R

My R program is as below:

hcluster <- function(dmatrix) {
    imatrix <- NULL
    hc <- hclust(dist(dmatrix), method="average")
    for(h in sort(unique(hc$height))) {
        hc.index <- c(h,as.vector(cutree(hc,h=h)))
        imatrix <- cbind(imatrix, hc.index)
    }
    return(imatrix)
}

dmatrix_file = commandArgs(trailingOnly = TRUE)[1]
print(paste('Reading distance matrix from', dmatrix_file))
dmatrix <- as.matrix(read.csv(dmatrix_file,header=FALSE))

imatrix <- hcluster(dmatrix)
imatrix_file = paste("results",dmatrix_file,sep="-")
print(paste('Wrinting results to', imatrix_file))
write.table(imatrix, file=imatrix_file, sep=",", quote=FALSE, row.names=FALSE, col.names=FALSE)
print('done!')

My input is a distance matrix (of course symmetric). When I execute above program with a distance matrix larger than about thousands records(Nothing happen for several hundreds), it gave me the error message:

Error in cutree(hc, h = h) : 
  the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first
Calls: hcluster -> as.vector -> cutree
Execution halted

My machine has about 16GB of RAMs and 4CPU, so it won't be the problem of resources.

Can anyone please let me know what's the problem? Thanks!!

like image 949
Kevin Avatar asked Feb 26 '12 21:02

Kevin


1 Answers

I'm not much of an R wizard - but I ran into exactly this problem.

A potential answer is described here:

https://stat.ethz.ch/pipermail/r-help/2008-May/163409.html

like image 94
plof Avatar answered Sep 28 '22 07:09

plof