Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

results of k-means used in R [closed]

I was using the kmeans instruction of R for performing the k-means algorithm in a dataset. I have a question about some parameters that I go. The results are:

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.006000    3.428000     1.462000    0.246000

in that case what does Cluster means stands for? It is the mean of the distances of all the objects within the cluster?

Also in the last part I have:

Within cluster sum of squares by cluster:
[1] 15.15100 39.82097 23.87947
 (between_SS / total_SS =  88.4 %)

That value of 88.4% what it could be its interpretation?

Thanks

like image 984
Little Avatar asked Jan 25 '13 15:01

Little


1 Answers

The cluster means combine to give the centroids (centres) of the clusters in the multivariate space defined by the input variables. Hence the set of means for cluster 1 that you show are the coordinates of the centroid (centre) for that cluster. They are computed as the mean of the values for each variable for those samples assigned to that cluster.

The 88.4 % is a measure of the total variance in your data set that is explained by the clustering. k-means minimises the within group dispersion (spread) of the samples, the sum of squares. This maximises the between-group dispersion. By assigning the samples to k clusters rather than n (number of samples) clusters achieved a reduction in sums of squares of 88.4 %.

like image 98
Gavin Simpson Avatar answered Nov 05 '22 00:11

Gavin Simpson