When using a clustering algorithm, you always have to specify a shutoff parameter.
I am currently using Agglomerative clustering with scikit learn, and the only shutoff parameter that I can see is the number of clusters.
agg_clust = AgglomerativeClustering(n_clusters=N)
y_pred = agg_clust.fit_predict(matrix)
But I would like to find an algorithm where you would specify the maximum distance within elements of a clusters, and not the number of clusters. Therefore the algorithm would simply agglomerate clusters until the max distance is reached.
Any suggestion ?
What you are looking for is implemented in scipy.cluster.hierarchy, see here.
So here is how you can do it:
from scipy.cluster.hierarchy import linkage, fcluster
y_pred = fcluster(linkage(matrix), t, criterion='distance')
# or more direct way
from scipy.cluster.hierarchy import fclusterdata
y_pred = fclusterdata(matrix, t, criterion='distance')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With