Given a distance matrix, with similarity between various professors :
prof1 prof2 prof3
prof1 0 0.8 0.9
prof2 0.8 0 0.2
prof3 0.9 0.2 0
I need to perform hierarchical clustering on this data, where the above data is in the form of 2-d matrix
data_matrix=[[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]
I tried checking if I can implement it using sklearn.cluster AgglomerativeClustering but it is considering all the 3 rows as 3 separate vectors and not as a distance matrix. Can it be done using this or scipy.cluster.hierarchy?
Yes, you can do it with sklearn
. You need to set:
affinity='precomputed'
, to use a matrix of distanceslinkage='complete'
or 'average'
, because default linkage(Ward) works only on coordinate input.With precomputed affinity, input matrix is interpreted as a matrix of distances between observations. The following code
from sklearn.cluster import AgglomerativeClustering
data_matrix = [[0,0.8,0.9],[0.8,0,0.2],[0.9,0.2,0]]
model = AgglomerativeClustering(affinity='precomputed', n_clusters=2, linkage='complete').fit(data_matrix)
print(model.labels_)
will return labels [1 0 0]
: the 1st professor goes to one cluster, and the 2nd and 3rd - to another.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With