Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing outliers from a k-mean cluster

I have number of smaller data sets, containing 10 XY coordinates each. I am using Matlab (R2012a)and k-means to obtain a centroid. In some of the clusters (see figure below) I can see some extreme points, beacuse my dataset are as small as they are, one outliner destroys the value of my centroid. Is there a easy way to exlude these points? Supposingly Matlab has a 'exclude outliers' function but I can't see it anywhere in the tool menu.. Thank you for your help! (and yes I am new to this:-)

enter image description here

like image 569
carro Avatar asked Dec 21 '12 11:12

carro


1 Answers

k-means can be quite sensitive to outliers in your data set. The reason is simply that k-means tries to optimize the sum of squares. And thus a large deviation (such as of an outlier) gets a lot of weight.

If you have a noisy data set with outliers, you might be better off using an algorithm that has specialized noise handling such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Note the "N" in the acronym: Noise. In contrast to e.g. k-means, but also many other clustering algorithms, DBSCAN can decide to not cluster objects that are in regions of low density.

like image 108
Erich Schubert Avatar answered Sep 29 '22 11:09

Erich Schubert