I am reading about the difference between k-means clustering and k-medoid clustering.
Supposedly there is an advantage to using the pairwise distance measure in the k-medoid algorithm, instead of the more familiar sum of squared Euclidean distance-type metric to evaluate variance that we find with k-means. And apparently this different distance metric somehow reduces noise and outliers.
I have seen this claim but I have yet to see any good reasoning as to the mathematics behind this claim.
What makes the pairwise distance measure commonly used in k-medoid better? More exactly, how does the lack of a squared term allow k-medoids to have the desirable properties associated with the concept of taking a median?
K-means attempts to minimize the total squared error, while k-medoids minimizes the sum of dissimilarities between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k -means algorithm, k -medoids chooses datapoints as centers ( medoids or exemplars).
The main difference between K-Means and K-Medoids is that K-Means will form clusters based on the distance of observations to each centroid, while K-Medoid forms clusters based on the distance to medoids.
K medoid computes all the pairwise distances, it is O(n^2*k*i), k-means runs in O(n*k*i), k times the number of iterations is k*i << n. Hope this answer helps.
K- Medoids is more robust as compared to K-Means as in K-Medoids we find k as representative object to minimize the sum of dissimilarities of data objects whereas, K-Means used sum of squared Euclidean distances for data objects. And this distance metric reduces noise and outliers.
First of all, you can use k-medoids with any similarity measure. K-means however, may fail to converge - it really must only be used with distances that are consistent with the mean. So e.g. Absolute Pearson Correlation must not be used with k-means, but it works well with k-medoids.
Secondly, the medoid as used by k-medoids is roughly comparable to the median (in fact, there also is k-medians, which is like K-means but for Manhattan distance). If you look up literature on the median, you will see plenty of explanations and examples why the median is more robust to outliers than the arithmetic mean. Essentially, these explanations and examples will also hold for the medoid. It is a more robust estimate of a representative point than the mean as used in k-means.
Consider this 1-dimensional example:
[1, 2, 3, 4, 100000]
Both the median and medoid of this set are 3. The mean is 20002.
Which do you think is more representative of the data set? The mean has the lower squared error, but assuming that there might be a measurement error in this data set ...
Technically, the notion of breakdown point is used in statistics. The median has a breakdown point of 50% (i.e. half of the data points can be incorrect, and the result is still unaffected), whereas the mean has a breakdown point of 0 (i.e. a single large observation can yield a bad estimate).
I do not have a proof, but I assume the medoid will have a similar breakdown point as the median.
That's the main drawback. Usually, PAM takes much longer to run than k-means. As it involves computing all pairwise distances, it is O(n^2*k*i)
; whereas k-means runs in O(n*k*i)
where usually, k times the number of iterations is k*i << n
.
I think this has to do with the selection of the center for the cluster. k-means will select the "center" of the cluster, while k-medoid will select the "most centered" member of the cluster. In a cluster with outliers (i.e. points far away from the other members of the cluster) k-means will place the center of the cluster towards the outliers, whereas k-medoid will select one of the more clustered members (the medoid) as the center.
It now depends on what you use clustering for. If you just wanted to classify a bunch of objects then you don't really care about where the center is; but if the clustering was used to train a decider which will now classify new objects based on those center points, then k-medoid will give you a center closer to where a human would place the center.
In wikipedia's words:
"It [k-medoid] is more robust to noise and outliers as compared to k-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances."
Here's an example:
Suppose you want to cluster on one dimension with k=2. One cluster has most of its members around 1000 and the other around -1000; but there is an outlier (or noise) at 100000. It obviously belongs to the cluster around 1000 but k-means will put the center point away from 1000 and towards 100000. This may even make some of the members of the 1000 cluster (say a member with value 500) to be assigned to the -1000 cluster. k-medoid will select one of the members around 1000 as the medoid, it'll probably select one that is bigger than 1000, but it will not select an outlier.
Just a tiny note added to @Eli's answer, K-medoid is more robust to noise and outliers than k-means because the latter selects the cluster center, which is mostly just a "virtue point", on the other hand the former chooses the "actual object" from the cluster.
Suppose you have five 2D points in one cluster with the coordinates of (1,1),(1,2),(2,1),(2,2), and (100,100). If we don't consider the object exchanges among the clusters, with k-means you will get the center of cluster (21.2,21.2) which is pretty distracted by the point (100,100). However, with k-medoid will choose the center among (1,1),(1,2),(2,1),and (2,2) according to its algorithm.
Here is a fun applet ( E.M. Mirkes, K-means and K-medoids applet. University of Leicester, 2011 ) that you can randomly generate dataset in the 2D plane and compare k-medoid and k-means learning process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With