I am little confused by the k-means loss functions. What I ususally find is the loss function:

with r_{nk} being an indikator if observation x_i belongs to cluster k and \mu_k being the cluster center. However in the book by Hastie, Tibshirani and Friedman, I find:

such that clusters with more observations react more sensitive to deviations from the cluster center as n_k stands for the number of observaions in cluster k. Does anyone know which is right? If you have the book "The elements of statistical learning", the derivation is on page 508 - 510.
Cheers
Actually, the correct one is the first formula you mentioned (the non-weighted one), and the derivation of the second one in the book is incorrect. The main equation the book uses (equation 14.31 in section 14.3.6) to derive their formula is not correct, where they claim an equality between the first line and the second one. Here is a small counter example where we have 1 cluster (i.e. K=1), and three points (1,2,3). Also in the book, the algorithm 14.1 in page 510 is the one that minimises the first loss function in your question, not their loss function.
I am not stating that their final formula doesn't make sense, it is just that the derivation of this formula seems wrong to me, and the algorithm they show is the one known to minimise your first function. Note that in their algorithm the weights N_k don't exist, and the only thing that determines to which cluster a point belongs is the distance between this point and the associated centroid, N_k has nothing to do with that, which shows that the algorithm isn't a solver to their function.
Moreover, in case we have unbalanced clusters, in the sense that some clusters have much less points than others, their formula with the N_k weights promotes cutting parts of big clusters, and assign them to small neighbour clusters, to avoid having big N_k which mean a larger loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With