Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustering list for hclust function

Tags:

r

hclust

Using plot(hclust(dist(x))) method, I was able to draw a cluster tree map. It works. Yet I would like to get a list of all clusters, not a tree diagram, because I have huge amount of data (like 150K nodes) and the plot gets messy.

In other words, lets say if a b c is a cluster and if d e f g is a cluster then I would like to get something like this:

1 a,b,c 2 d,e,f,g 

Please note that this is not exactly what I want to get as an "output". It is just an example. I just would like to be able to get a list of clusters instead of a tree plot It could be vector, matrix or just simple numbers that show which groups elements belong to.

How is this possible?

like image 265
dave Avatar asked Jun 29 '11 09:06

dave


People also ask

What does Hclust function do in R?

The hclust function in R uses the complete linkage method for hierarchical clustering by default. This particular clustering method defines the cluster distance between two clusters to be the maximum distance between their individual components.

Which R function can be used for applying different agglomeration methods?

This can be done with the R function cutree. It cuts a tree (or dendogram), as resulting from hclust (or diana/agnes), into several groups either by specifying the desired number of groups (k) or the cut height (h).

What are the two types of hierarchical clustering?

There are two types of hierarchical clustering: divisive (top-down) and agglomerative (bottom-up).

Which function is used for hierarchical clustering?

We consider cost functions for cluster trees that capture the quality of the hierarchical clustering produced by $T$. The Axiom.


2 Answers

I will use the dataset available in R to demonstrate how to cut a tree into desired number of pieces. Result is a table.

Construct a hclust object.

hc <- hclust(dist(USArrests), "ave") #plot(hc) 

You can now cut the tree into as many branches as you want. For my next trick, I will split the tree into two groups. You set the number of cuts with the k parameter. See ?cutree and the use of paramter h which may be more useful to you (see cutree(hc, k = 2) == cutree(hc, h = 110)).

cutree(hc, k = 2)        Alabama         Alaska        Arizona       Arkansas     California               1              1              1              2              1        Colorado    Connecticut       Delaware        Florida        Georgia               2              2              1              1              2          Hawaii          Idaho       Illinois        Indiana           Iowa               2              2              1              2              2          Kansas       Kentucky      Louisiana          Maine       Maryland               2              2              1              2              1   Massachusetts       Michigan      Minnesota    Mississippi       Missouri               2              1              2              1              2         Montana       Nebraska         Nevada  New Hampshire     New Jersey               2              2              1              2              2      New Mexico       New York North Carolina   North Dakota           Ohio               1              1              1              2              2        Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina               2              2              2              2              1    South Dakota      Tennessee          Texas           Utah        Vermont               2              2              2              2              2        Virginia     Washington  West Virginia      Wisconsin        Wyoming               2              2              2              2              2 
like image 147
Roman Luštrik Avatar answered Oct 02 '22 18:10

Roman Luštrik


lets say,

y<-dist(x) clust<-hclust(y) groups<-cutree(clust, k=3) x<-cbind(x,groups) 

now you will get for each record, the cluster group. You can subset the dataset as well:

x1<- subset(x, groups==1) x2<- subset(x, groups==2) x3<- subset(x, groups==3) 
like image 21
user2783711 Avatar answered Oct 02 '22 20:10

user2783711