My R program is as below:
hcluster <- function(dmatrix) {
imatrix <- NULL
hc <- hclust(dist(dmatrix), method="average")
for(h in sort(unique(hc$height))) {
hc.index <- c(h,as.vector(cutree(hc,h=h)))
imatrix <- cbind(imatrix, hc.index)
}
return(imatrix)
}
dmatrix_file = commandArgs(trailingOnly = TRUE)[1]
print(paste('Reading distance matrix from', dmatrix_file))
dmatrix <- as.matrix(read.csv(dmatrix_file,header=FALSE))
imatrix <- hcluster(dmatrix)
imatrix_file = paste("results",dmatrix_file,sep="-")
print(paste('Wrinting results to', imatrix_file))
write.table(imatrix, file=imatrix_file, sep=",", quote=FALSE, row.names=FALSE, col.names=FALSE)
print('done!')
My input is a distance matrix (of course symmetric). When I execute above program with a distance matrix larger than about thousands records(Nothing happen for several hundreds), it gave me the error message:
Error in cutree(hc, h = h) :
the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first
Calls: hcluster -> as.vector -> cutree
Execution halted
My machine has about 16GB of RAMs and 4CPU, so it won't be the problem of resources.
Can anyone please let me know what's the problem? Thanks!!
I'm not much of an R wizard - but I ran into exactly this problem.
A potential answer is described here:
https://stat.ethz.ch/pipermail/r-help/2008-May/163409.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With