Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustering Algorithm with discrete and continuous attributes?

Does anyone know a good algorithm for perform clustering on both discrete and continuous attributes? I am working on a problem of identifying a group of similar customers and each customer has both discrete and continuous attributes (Think type of customers, amount of revenue generated by this customer, geographic location and etc..)

Traditionally algorithm like K-means or EM work for continuous attributes, what if we have a mix of continuous and discrete attributes?

like image 662
Matt W Avatar asked May 06 '09 13:05

Matt W


People also ask

Can discrete variables be used in clustering?

Clustering with discrete variables is possible in Tableau and can be useful in some cases.

Can clustering be used for continuous data?

They are 'hard clustering' algorithms – every data point is exclusively assigned to one cluster. The number of clusters must be predefined by the analyst. Because means/medians are used for clustering, these algorithms are only appropriate for continuous data. Therefore, they are unsuitable for categorical data.

What is the difference between discrete and continuous attributes?

The primary difference, though, between discrete and continuous data is that discrete data is a finite value that can be counted whereas continuous data has an infinite number of possible values that can be measured.

Which clustering algorithm would you prefer if you have both continuous and categorical variables?

KModes clustering is one of the unsupervised Machine Learning algorithms that is used to cluster categorical variables. You might be wondering, why KModes when we already have KMeans. KMeans uses mathematical measures (distance) to cluster continuous data.


1 Answers

If I remember correctly, then COBWEB algorithm could work with discrete attributes.

And you can also do different 'tricks' to the discrete attributes in order to create meaningful distance metrics.

You could google for clustering of categorical/discrete attributes, one of the first hits: ROCK: A Robust Clustering Algorithm for Categorical Attributes.

like image 79
Anonymous Avatar answered Sep 28 '22 11:09

Anonymous