Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

k-means with selected initial centers

I am trying to k-means clustering with selected initial centroids. It says here that to specify your initial centers:

init : {‘k-means++’, ‘random’ or an ndarray} 

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

My code in Python:

X = np.array([[-19.07480000,  -8.536],
              [22.010800000,-10.9737],
              [12.659700000,19.2601]], np.float64)
km = KMeans(n_clusters=3,init=X).fit(data)
# print km
centers = km.cluster_centers_
print centers

Returns an error:

RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
  n_jobs=self.n_jobs)

and return the same initial centers. Any idea how to form the initial centers so it can be accepted?

like image 308
lel Avatar asked Mar 04 '15 18:03

lel


People also ask

Does k-means depend on initial selection of cluster center?

k-Means [1] is one of the most important algorithm for Clustering. Traditional k-Means algorithm selects initial centroids randomly and in k-Means algorithm result of clustering highly depends on selection of initial centroids.

How do you use initial centroids k-means?

to specify the initial centroids, you just need to pass your array of centroids as a value to the parameter init . Example: from sklearn.cluster import KMeans import numpy as np my_centroids = np.array([[1, 2, 3, 4, 5], [2, 4, 6, 5, 3], [1, 2, 5, 7, 1]]) kmeans = KMeans(n_clusters=3, random_state=0, init=my_centroids)

Can we choose any random initial centroids at the beginning of k-means?

Specifically, K-means tends to perform better when centroids are seeded in such a way that doesn't clump them together in space. In short, the method is as follows: Choose one of your data points at random as an initial centroid. Calculate D(x), the distance between your initial centroid and all other data points, x.

How to find the initial cluster centers in k means?

Also, a form of hierarchical clustering (often Ward's method) can be used as a method to find the initial cluster centers, which can then be passed off to k -means for the actual data clustering task. This can be effective, but since it would mean also discussing hierarchical clustering we will leave this until a later article.

How to set the number of random initializations in Kmeans?

The default behavior of KMeans is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method ). The number of random initializations is then controlled by the n_init= parameter ( docs ): Number of time the k-means algorithm will be run with different centroid seeds.

What is a k random data point?

random data points: In this approach, described in the "traditional" case above, k random data points are selected from the dataset and used as the initial centroids, an approach which is obviously highly volatile and provides for a scenario where the selected centroids are not well positioned throughout the entire data space.

How do you find the center of a k-cluster?

One (the “Forgy” method) is to randomly select k data points to be the centers of the k-clusters, the other (the “Random Partition” method) assigns each observation, randomly, to one of k different clusters. Then you start refining by either the cluster membership and then cluster center, or cluster center then membership.


1 Answers

The default behavior of KMeans is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method). The number of random initializations is then controlled by the n_init= parameter (docs):

n_init : int, default: 10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

If you pass an array as the init= argument then only a single initialization will be performed using the centroids explicitly specified in the array. You are getting a RuntimeWarning because you are still passing the default value of n_init=10 (here are the relevant lines of source code).

It's actually totally fine to ignore this warning, but you can make it go away completely by passing n_init=1 if your init= parameter is an array.

like image 113
ali_m Avatar answered Oct 29 '22 20:10

ali_m