How to get centroids from SciPy's hierarchical agglomerative clustering?

Tags:

I am using SciPy's hierarchical agglomerative clustering methods to cluster a m x n matrix of features, but after the clustering is complete, I can't seem to figure out how to get the centroid from the resulting clusters. Below follows my code:

Y = distance.pdist(features)
Z = hierarchy.linkage(Y, method = "average", metric = "euclidean")
T = hierarchy.fcluster(Z, 100, criterion = "maxclust")

I am taking my matrix of features, computing the euclidean distance between them, and then passing them onto the hierarchical clustering method. From there, I am creating flat clusters, with a maximum of 100 clusters

Now, based on the flat clusters T, how do I get the 1 x n centroid that represents each flat cluster?

542

asked Feb 20 '12 13:02

Adrian Rosebrock

2 Answers

A possible solution is a function, which returns a codebook with the centroids like kmeans in scipy.cluster.vq does. Only thing you need is the partition as vector with flat clusters part and the original observations X

def to_codebook(X, part):
    """
    Calculates centroids according to flat cluster assignment

    Parameters
    ----------
    X : array, (n, d)
        The n original observations with d features

    part : array, (n)
        Partition vector. p[n]=c is the cluster assigned to observation n

    Returns
    -------
    codebook : array, (k, d)
        Returns a k x d codebook with k centroids
    """
    codebook = []

    for i in range(part.min(), part.max()+1):
        codebook.append(X[part == i].mean(0))

    return np.vstack(codebook)

169

answered Sep 25 '22 10:09

embert

You can do something like this (D=number of dimensions):

# Sum the vectors in each cluster
lens = {}      # will contain the lengths for each cluster
centroids = {} # will contain the centroids of each cluster
for idx,clno in enumerate(T):
    centroids.setdefault(clno,np.zeros(D)) 
    centroids[clno] += features[idx,:]
    lens.setdefault(clno,0)
    lens[clno] += 1
# Divide by number of observations in each cluster to get the centroid
for clno in centroids:
    centroids[clno] /= float(lens[clno])

This will give you a dictionary with cluster number as the key and the centroid of the specific cluster as the value.

answered Sep 24 '22 10:09

dkar

Related questions
                            
                                How to convert requests.cookiejar to qnetworkcookiejar?
                            
                                Using Numpy in different platforms
                            
                                How to create an OUTPUT typemap for a class type?
                            
                                How do I make rdpy-rdpmitm let client re-input username and password when password not incorrect
                            
                                Jupyter notebook dead kernel
                            
                                Why doesn't Spyder obey my IPython config file?
                            
                                Running .exe on Azure
                            
                                Speed up App Engine local SDK DB query when multiple order properties present?
                            
                                app engine python gcloud not updating instance
                            
                                Properly convert png to npy numpy array (Image to Array)
                            
                                Tensorflow server: I don't want to initialize global variables for every session
                            
                                Strange sdl side-effect on unrelated windows
                            
                                Parallel code not working when function to parallelize is in a different file
                            
                                Problem of discrete logarithm calculation using Python code
                            
                                What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?
                            
                                Python packaging in 2020
                            
                                Stochastic calculus library in python
                            
                                Django - check if list contains something in a template
                            
                                Make a py2exe exe run without a console?
                            
                                Asynchronously redirect stdout/stdin from embedded python to c++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get centroids from SciPy's hierarchical agglomerative clustering?

Tags:

python

numpy

scipy

hierarchical-clustering

Adrian Rosebrock

People also ask

2 Answers

embert

dkar

Recent Activity

Donate For Us