I have found the following formulas for Inter-Cluster and Intra-Cluster distances and I am not sure I understand how they work.
Inter-Cluster Distance
Shouldn't there be a square root in formulas above?
Inter-Cluster and Intra-Cluster:
Why is there the j index starting from N+1? And not from 1 to N2?
Which one is the correct one? Or are there any equivalencies? Or should I go for the distance between centroids for the inter cluster distance? Seems rather simple. What about the intra cluster distance?
I find the wikipedia formulas http://en.wikipedia.org/wiki/Cluster_analysis#Internal_evaluation even harder to understand.
I need to compute this distances in order to proper group colors in order to create a reduced color palette, so I'm thinking the more accurate these distances are, the more accurate the groupping (formula instead of distance between centroids distance for inter-cluster). The vectors are 3-dimensional(RGB components).
A lot of algorithms don't really use "distance".
k-means for example minimizes variance, which is the sum-of-squares you are seeing here. Now sum-of-squares is squared Euclidean distance, so one can argue that this algorithm also tries to minimize Euclidean distances; but the "natural" formulation of the algorithm doesn't use Euclidean distances, but sum-of-squares. if I'm not mistaken, the same also holds for Ward clustering, that you should compute it using variance, not euclidean distance.
Note that if you minimize z^2, and z cannot be negative, then you also minimized z.
See also: https://stats.stackexchange.com/questions/95793/is-there-an-advantage-to-squaring-dissimilarities-when-using-ward-clustering
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With