Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

k-means: Same clusters for every execution

Is it possible to get same kmeans clusters for every execution for a particular data set. Just like for a random value we can use a fixed seed. Is it possible to stop randomness for clustering?

like image 956
Robin Avatar asked Sep 21 '11 13:09

Robin


People also ask

Does k-means always result in the same clustering?

They are not the same. They are similar. K-means is an algorithm that is in a way moving centroids iteratively so that they become better and better at splitting data and while this process is deterministic, you have to pick initial values for those centroids and this is usually done at random.

Why does k-means clustering give different clusters everytime it is run?

K-means clustering does involve a random selection process for the initial centroid guesses, so you may get different results from different runs.

How many clusters should I use for k-means?

The optimal number of clusters k is the one that maximize the average silhouette over a range of possible values for k. This also suggests an optimal of 2 clusters.

Which of the following is not true about k-means clustering?

Module -4 Clustering : This one is NOT TRUE about k-means clustering — As k-means is an iterative algorithm, it guarantees that it will always converge to the global optimum. Customer Segmentation is a supervised way of clustering data, based on the similarity of customers to each other. — False.


1 Answers

Yes. Use set.seed to set a seed for the random value before doing the clustering.

Using the example in kmeans:

set.seed(1)
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")


set.seed(2)
XX <- kmeans(x, 2)

set.seed(2)
YY <- kmeans(x, 2)

Test for equality:

identical(XX, YY)
[1] TRUE
like image 147
Andrie Avatar answered Oct 11 '22 21:10

Andrie