Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

questions on clustering methods

recently I came to study clustering in data-mining and I've studied sequential clustering and hierarchical clustering and k-means.

I also read about a statement that distinguishes k-means from the other two clustering technique,saying k-means is not very good at dealing with nominal attributes,but the text didn't explain this point.So far,the only difference that I can see is that for K-means,we will know in advance we will need exactly K clusters while we don't know how many clusters we need for other two clustering methods.

So could anybody give me some idea here on why such statement exists,i.e.,k-means has this problem when dealing with examples of nominal attributes and is there a way to overcome this?

Thanks in advance.

like image 507
Kevin Avatar asked Nov 04 '10 15:11

Kevin


1 Answers

The k-means algorithm calculates cluster centroids by taking the mean values of all the points in the cluster. If a parameter is nominal then you can't take an mean value.

Sometimes nominal values can be put into a kind of order and then mapped to real values. For example, days of the week could be mapped onto the range [1.0 - 7.0], but then again sometimes that isn't possible, for example an attribute with values [Windows, Linux, OSX].

like image 65
Stompchicken Avatar answered Nov 15 '22 10:11

Stompchicken