Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use 'hclust' as function call in R

I tried to construct the clustering method as function the following ways:

mydata <- mtcars

# Here I construct hclust as a function
hclustfunc <- function(x) hclust(as.matrix(x),method="complete")

# Define distance metric
distfunc <- function(x) as.dist((1-cor(t(x)))/2)

# Obtain distance
d <- distfunc(mydata)

# Call that hclust function
fit<-hclustfunc(d)

# Later I'd do
# plot(fit)

But why it gives the following error:

Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") : 
  missing value where TRUE/FALSE needed

What's the right way to do it?

like image 488
neversaint Avatar asked Dec 03 '13 05:12

neversaint


People also ask

Which R function can be used for applying different agglomeration methods?

Agglomerative Hierarchical Clustering For 'hclust' function, we require the distance values which can be computed in R by using the 'dist' function. Default measure for dist function is 'Euclidean', however you can change it with the method argument.

What is Cutree function in R?

Remember from the video that cutree() is the R function that cuts a hierarchical model. The h and k arguments to cutree() allow you to cut the tree based on a certain height h or a certain number of clusters k.

What package is Hclust in R?

hclust() is a function that belongs to the stats package. You do not have to install it, as it comes 'bundled' with R.


1 Answers

Do read the help for functions you use. ?hclust is pretty clear that the first argument d is a dissimilarity object, not a matrix:

Arguments:

       d: a dissimilarity structure as produced by ‘dist’.

Update

As the OP has now updated their question, what is need is

hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) as.dist((1-cor(t(x)))/2)
d <- distfunc(mydata)
fit <- hclustfunc(d)

Original

What you want is

hclustfunc <- function(x, method = "complete", dmeth = "euclidean") {    
    hclust(dist(x, method = dmeth), method = method)
}

and then

fit <- hclustfunc(mydata)

works as expected. Note you can now pass in the dissimilarity coefficient method as dmeth and the clustering method.

like image 140
Gavin Simpson Avatar answered Sep 17 '22 19:09

Gavin Simpson