Does anyone know a good algorithm for perform clustering on both discrete and continuous attributes? I am working on a problem of identifying a group of similar customers and each customer has both discrete and continuous attributes (Think type of customers, amount of revenue generated by this customer, geographic location and etc..)
Traditionally algorithm like K-means or EM work for continuous attributes, what if we have a mix of continuous and discrete attributes?
Clustering with discrete variables is possible in Tableau and can be useful in some cases.
They are 'hard clustering' algorithms – every data point is exclusively assigned to one cluster. The number of clusters must be predefined by the analyst. Because means/medians are used for clustering, these algorithms are only appropriate for continuous data. Therefore, they are unsuitable for categorical data.
The primary difference, though, between discrete and continuous data is that discrete data is a finite value that can be counted whereas continuous data has an infinite number of possible values that can be measured.
KModes clustering is one of the unsupervised Machine Learning algorithms that is used to cluster categorical variables. You might be wondering, why KModes when we already have KMeans. KMeans uses mathematical measures (distance) to cluster continuous data.
If I remember correctly, then COBWEB algorithm could work with discrete attributes.
And you can also do different 'tricks' to the discrete attributes in order to create meaningful distance metrics.
You could google for clustering of categorical/discrete attributes, one of the first hits: ROCK: A Robust Clustering Algorithm for Categorical Attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With