sklearn Hierarchical Agglomerative Clustering using similarity matrix

Question

Given a distance matrix, with similarity between various professors :

              prof1     prof2     prof3
       prof1     0        0.8     0.9
       prof2     0.8      0       0.2
       prof3     0.9      0.2     0

I need to perform hierarchical clustering on this data, where the above data is in the form of 2-d matrix

       data_matrix=[[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]

I tried checking if I can implement it using sklearn.cluster AgglomerativeClustering but it is considering all the 3 rows as 3 separate vectors and not as a distance matrix. Can it be done using this or scipy.cluster.hierarchy?

David Dale · Accepted Answer

Yes, you can do it with sklearn. You need to set:

affinity='precomputed', to use a matrix of distances
linkage='complete' or 'average', because default linkage(Ward) works only on coordinate input.

With precomputed affinity, input matrix is interpreted as a matrix of distances between observations. The following code

from sklearn.cluster import AgglomerativeClustering
data_matrix = [[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]
model = AgglomerativeClustering(affinity='precomputed', n_clusters=2, linkage='complete').fit(data_matrix)
print(model.labels_)

will return labels [1 0 0]: the 1st professor goes to one cluster, and the 2nd and 3rd - to another.

sklearn Hierarchical Agglomerative Clustering using similarity matrix

Tags:

python

pandas

scikit-learn

hierarchical-clustering

ICoder

1 Answers

David Dale

Recent Activity

Donate For Us

sklearn Hierarchical Agglomerative Clustering using similarity matrix

Tags:

python

pandas

scikit-learn

hierarchical-clustering

ICoder

1 Answers

David Dale

Related questions

Recent Activity

Donate For Us