I was using the kmeans instruction of R for performing the k-means algorithm in a dataset. I have a question about some parameters that I go. The results are:
Cluster means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.006000 3.428000 1.462000 0.246000
in that case what does Cluster means stands for? It is the mean of the distances of all the objects within the cluster?
Also in the last part I have:
Within cluster sum of squares by cluster:
[1] 15.15100 39.82097 23.87947
(between_SS / total_SS = 88.4 %)
That value of 88.4% what it could be its interpretation?
Thanks
The cluster means combine to give the centroids (centres) of the clusters in the multivariate space defined by the input variables. Hence the set of means for cluster 1 that you show are the coordinates of the centroid (centre) for that cluster. They are computed as the mean of the values for each variable for those samples assigned to that cluster.
The 88.4 %
is a measure of the total variance in your data set that is explained by the clustering. k-means minimises the within group dispersion (spread) of the samples, the sum of squares. This maximises the between-group dispersion. By assigning the samples to k clusters rather than n (number of samples) clusters achieved a reduction in sums of squares of 88.4 %.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With