Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn Hierarchical Agglomerative Clustering using similarity matrix

Given a distance matrix, with similarity between various professors :

              prof1     prof2     prof3
       prof1     0        0.8     0.9
       prof2     0.8      0       0.2
       prof3     0.9      0.2     0

I need to perform hierarchical clustering on this data, where the above data is in the form of 2-d matrix

       data_matrix=[[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]

I tried checking if I can implement it using sklearn.cluster AgglomerativeClustering but it is considering all the 3 rows as 3 separate vectors and not as a distance matrix. Can it be done using this or scipy.cluster.hierarchy?

like image 262
ICoder Avatar asked Nov 16 '17 03:11

ICoder


1 Answers

Yes, you can do it with sklearn. You need to set:

  • affinity='precomputed', to use a matrix of distances
  • linkage='complete' or 'average', because default linkage(Ward) works only on coordinate input.

With precomputed affinity, input matrix is interpreted as a matrix of distances between observations. The following code

from sklearn.cluster import AgglomerativeClustering
data_matrix = [[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]
model = AgglomerativeClustering(affinity='precomputed', n_clusters=2, linkage='complete').fit(data_matrix)
print(model.labels_)

will return labels [1 0 0]: the 1st professor goes to one cluster, and the 2nd and 3rd - to another.

like image 124
David Dale Avatar answered Sep 28 '22 18:09

David Dale