How to get Agglomerative Clustering "Centroid" in python Scikit-learn

Tags:

This code is what I am using for silhouette_score. And in here I am using Agglomerative Clustering, linkage as Ward. I would like to get "Centroid" of Agglomerative Clustering, would it be possible from Agglomerative Clustering? I could only get K-mean's centroid and Fuzzy c-mean.

df1 
    Height  time_of_day resolution
272 1.567925    1.375000    0.594089
562 1.807508    1.458333    0.594089
585 2.693542    0.416667    0.594089
610 1.036305    1.458333    0.594089
633 1.117111    0.416667    0.594089
658 1.542407    1.458333    0.594089
681 1.930844    0.416667    0.594089
802 1.505548    1.458333    0.594089
808 1.009369    1.708333    0.594089


def clustering(df1):
    X = df1.iloc[:].values
    range_n_clusters = [2,3,4]
    for n_clusters in range_n_clusters:
        # Create a subplot with 1 row and 2 columns
        clusterer = AgglomerativeClustering(n_clusters=n_clusters, linkage='ward')  
        clusterer.fit_predict(X)
        cluster_labels = clusterer.labels_

        silhouette_avg = silhouette_score(X, cluster_labels)
        if silhouette_avg > 0.4:
            print("For n_clusters =", n_clusters,
                  "The average silhouette_score is :", silhouette_avg)
            fig, (ax1, ax2) = plt.subplots(1, 2)

            fig.set_size_inches(15, 5)

            ax1.set_xlim([-0.1, 1])
            ax1.set_ylim([0, len(X) + (n_clusters + 1) * 10])

            sample_silhouette_values = silhouette_samples(X, cluster_labels)

            y_lower = 10
            for i in range(n_clusters):
                ith_cluster_silhouette_values = \
                    sample_silhouette_values[cluster_labels == i]

                ith_cluster_silhouette_values.sort()

                size_cluster_i = ith_cluster_silhouette_values.shape[0]
                y_upper = y_lower + size_cluster_i

                color = cm.nipy_spectral(float(i) / n_clusters)
                ax1.fill_betweenx(np.arange(y_lower, y_upper),
                                  0, ith_cluster_silhouette_values,
                                  facecolor=color, edgecolor=color, alpha=0.7)
                ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
                y_lower = y_upper + 10  # 10 for the 0 samples

            ax1.set_title("The silhouette plot for the various clusters.")
            ax1.set_xlabel("The silhouette coefficient values")
            ax1.set_ylabel("Cluster label")
            ax1.axvline(x=silhouette_avg, color="red", linestyle="--")

            ax1.set_yticks([])  # Clear the yaxis labels / ticks
            ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])
            ax = Axes3D(fig)
            colors = cm.nipy_spectral(cluster_labels.astype(float) / n_clusters)
            ax.scatter(X[:, 1], X[:, 2], X[:, 0],marker='o', s=20, lw=0, alpha=0.7,
                        c=colors, edgecolor='k')

            plt.suptitle(("Silhouette analysis for HAC-ward clustering on sample data "
                          "with n_clusters = %d" % n_clusters),
                         fontsize=14, fontweight='bold')

    plt.show()  
    return

clusterer = AgglomerativeClustering(n_clusters=n_clusters, linkage='ward')  
clusterer.fit_predict(X)
cluster_labels = clusterer.labels_

This code is only for the Agglomerative Clustering method

from scipy.cluster.hierarchy import centroid, fcluster

from scipy.spatial.distance import pdist

cluster = AgglomerativeClustering(n_clusters=4, affinity='euclidean', linkage='ward')  
y = pdist(df1)

y

I Also have tried this code but I am not sure the 'y' is correct centroid.

from sklearn.neighbors.nearest_centroid import NearestCentroid
clf = NearestCentroid()
clf.fit(df1["Height"],df1["time_of_day"])
print(clf.centroids_)

For this I tried to use another method for X, Y centroids. And it shows error...

Please advice me whether I can get centroid from Agglomerative Clustering or I should stick to fuzzy-cmean

Thanks

226

asked Jun 05 '19 08:06

Pandalove

1 Answers

I believe you can use Agglomerative Clustering and you can get centroids using NearestCentroid, you just need to make some adjustment in your code, here is what worked for me:

from sklearn.neighbors import NearestCentroid

y_predict = clusterer.fit_predict(X)
#...
clf = NearestCentroid()
clf.fit(X, y_predict)
print(clf.centroids_)

The only thing I think was missing in your code is that you're not getting back the returned values from fit_predict(), you can also try the dendrogram for better visualization, full code can be found here.

answered Sep 30 '22 11:09

Ibrahim.H

Related questions
                            
                                Regarding installing SciPy from PyCharm
                            
                                Validation on query_params in Django Rest Framework
                            
                                numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)
                            
                                Manually calling spark's garbage collection from pyspark
                            
                                Celery restart loss scheduled tasks
                            
                                Detecting comic strip dialogue bubble regions in images
                            
                                Since Tuples are immutable, why does slicing them make a copy instead of a view?
                            
                                Why doesn't except object catch everything in Python?
                            
                                Require login in a Django Channels socket?
                            
                                How To Format Email to Send as SMS
                            
                                Fatal Python error when using a dynamic version of Python to execute embedded python code
                            
                                How to do multiprocessing using Python for .NET on Windows?
                            
                                Graphviz: Make all nodes the same size as the largest
                            
                                sqlalchemy how to generate (many-to-many) relationships with automap_base
                            
                                conda-build of official AnacondaRecipes/opencv-feedstock fails looking for libpng.h
                            
                                Pandas 0.23 groupby and pct change not returning expected value
                            
                                How do you tell whether sys.stdin.readline() is going to block?
                            
                                Why does numpy.sin return a different result if the argument size is greater than 8192?
                            
                                Extract upwards pointing lane lines
                            
                                Multipart/mixed email attachments not showing up, but only in Windows 10 Mail

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get Agglomerative Clustering "Centroid" in python Scikit-learn

Tags:

python

pandas

cluster-analysis

scikit-learn

centroid

Pandalove

People also ask

1 Answers

Ibrahim.H

Recent Activity

Donate For Us