Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to draw hierarchical clustering?

Tags:

r

I have the following dataset:

data<-data.frame(X=c(1,2,3,4),Y=c(1,3,2,1))
for(i in 1:nrow(data)){ data[i,i]<-NA}
colnames(data) <- c("A","B","C","D")
rownames(data) <- c("A","B","C","D")
plot(hclust(dist(data)))

and then the result is the below image:

enter image description here

But, I am wondering how this plot is drawn. Here, I am trying to obtain the dendrogram step by step. We know that the distance matrix at the begining is as follow:

enter image description here

Every time we find the two points with minimum distance, and then merge them as a single cluster

enter image description here

So, the first merge are B, and C.And we update the distance matrix

enter image description here

Again we find the 2 points with minimum distance, which is D with cluster of B,C

enter image description here

Again we update the distance matrix

enter image description here

As a result I should have the following merges

  1. B, and C
  2. B,C, and D
  3. B,C,D, and A

But here the is a paradox with what R plot produced. So, how do you justify it?

like image 532
Sal-laS Avatar asked Apr 28 '17 14:04

Sal-laS


People also ask

How do I create a hierarchical cluster in Excel?

Select any cell in the data set, then on the XLMiner ribbon, from the Data Analysis tab, select Cluster - Hierarchical Clustering to open the Hierarchical Clustering dialog. From the Variables in Input Data list, select variables x1 through x8, then click > to move the selected variables to the Selected Variables list.


1 Answers

Updated Response - Using single linkage rather than the default complete linkage.

I'll do my best to explain how I see this working. I believe this is as simple as the method argument used in hclust. The default method for hclust does not follow the algorithm that you laid out but we can adjust the method so it does.

But first, I am getting an error on the plot you are trying to make:

> data<-data.frame(X=c(1,2,3,4),Y=c(1,3,2,1))
> for(i in 1:nrow(data)){ data[i,i]<-NA}
> colnames(data) <- c("A","B","C","D")
> rownames(data) <- c("A","B","C","D")
> plot(hclust(dist(data)))
Error in hclust(dist(data)) : 
  NA/NaN/Inf in foreign function call (arg 11)

What is your intention with the for(i in 1:nrow(data)){ data[i,i]<-NA} line? After that line, your data object looks like this:

   X  Y V3 V4
1 NA  1 NA NA
2  2 NA NA NA
3  3  2 NA NA
4  4  1 NA NA

However, if we can just start with the following code, we can generate the desired tree as follows:

dt<-data.frame(X = c(1, 2, 3, 4), Y = c(1, 3, 2, 1))
rownames(dt) <- c("A", "B", "C", "D")
dt<-dist(dt)
plot(hclust(dt, method = "single"))

enter image description here

NOTE the change in method on the hclust call to method = single. The default method is method = complete. The complete linkage method does not join clusters to nodes based on the shortest distance but on the longest intercluster distance. Extracting some material from the fantastic Introduction to Statistical Learning with Applications in R which describes the various linkage methods available:

enter image description here

This text, by James, Witten, Hastie, and Tibshirani, is available as a free download at the link above. The section on hierarchical clustering starts on page 390. Please let me know if this helps clear things up.

Original Response

I think you are calling the dist function in the wrong manner and perhaps too many times. Try this:

dt<-data.frame(X=c(1,2,3,4),Y=c(1,3,2,1))
rownames(dt) <- c("A","B","C","D")
dt<-dist(dt)
plot(hclust((dt)))

enter image description here

Effectively, you were calling dist on an object which was already a class of dist that you then turned into a matrix and then called dist on again within your call to plot.

We can examine just the distance object as follows:

> dt
         A        B        C
B 2.236068                  
C 2.236068 1.414214         
D 3.000000 2.828427 1.414214

There is no need to call dist on this object again before passing it to the hclust function.

like image 175
Nick Criswell Avatar answered Oct 01 '22 04:10

Nick Criswell