whats is the difference between "k means" and "fuzzy c means" objective functions?

2 Answers

K-Means clustering and Fuzzy-C Means Clustering are very similar in approaches. The main difference is that, in Fuzzy-C Means clustering, each point has a weighting associated with a particular cluster, so a point doesn't sit "in a cluster" as much as has a weak or strong association to the cluster, which is determined by the inverse distance to the center of the cluster.

Fuzzy-C means will tend to run slower than K means, since it's actually doing more work. Each point is evaluated with each cluster, and more operations are involved in each evaluation. K-Means just needs to do a distance calculation, whereas fuzzy c means needs to do a full inverse-distance weighting.

answered Oct 13 '22 14:10

Reed Copsey

BTW, the Fuzzy-C-Means (FCM) clustering algorithm is also known as Soft K-Means.

The objective functions are virtually identical, the only difference being the introduction of a vector which expresses the percentage of belonging of a given point to each of the clusters. This vector is submitted to a "stiffness" exponent aimed at giving more importance to the stronger connections (and conversely at minimizing the weight of weaker ones); incidently, when the stiffness factor tends towards infinity the resulting vector becomes a binary matrix, hence making the FCM model identical to that of the K-Means.

I think that except for some possible issue with the clusters which have no points assigned to them, it is possible to emulate the K-Means algorithm with that of the FCM one, by simulating an infinite stiffness factor (= by introducing a function which changes the biggest value in the vector to 1, and zeros out the other values, in lieu of the exponentiation of the vector). This is of course a very inefficient way of running a K-Means, because the algorithm then has to perform as many operations as with a true FCM (if only with 1 and 0 values, which does simplify the arithmetic, but not the complexity)

With regards to performance, the FCM therefore needs to perform k (i.e. number of clusters) multiplications for each point, for each dimension (not counting also the exponentiation to take stiffness into account). This, plus the overhead needed for computing and managing the proximity vector, explains why FCM is quite slower than plain K-Means.

But FCM/Soft-K-Means is less "stupid" than Hard-K-Means when it comes for example to elongated clusters (when points otherwise consistent in other dimensions tend to scatter along a particular dimension or two), and that's why it's still around ;-)

From my original reply:

Also, I just thought about this, but haven't put any "mathematical" thought to it, FCM may converge faster than hard K-Means, somewhat offsetting the bigger computational requirement of FCM.

May 2018 edit:

There is actually no reputable research that I could identify which support my above hunch about FCM's faster rate of convergence. Thank you Benjamin Horn to keep me honest ;-)

140

answered Oct 13 '22 14:10

mjv

Related questions
                            
                                Clustering tree structured data
                            
                                Algorithm for fitting objects in a space
                            
                                What does the Brown clustering algorithm output mean?
                            
                                Grouping similar news contents together like in GOOGLE NEWS
                            
                                Better text documents clustering than tf/idf and cosine similarity?
                            
                                Clustering cosine similarity matrix
                            
                                How to use 'hclust' as function call in R
                            
                                Changes of clustering results after each time run in Python scikit-learn
                            
                                Best clustering algorithm? (simply explained)
                            
                                How to get flat clustering corresponding to color clusters in the dendrogram created by scipy
                            
                                How to calculate BIC for k-means clustering in R
                            
                                Fast (< n^2) clustering algorithm
                            
                                Clustering text documents using scikit-learn kmeans in Python
                            
                                Clustering (fkmeans) with Mahout using Clojure
                            
                                Extracting clusters from seaborn clustermap
                            
                                Online k-means clustering
                            
                                DBSCAN for clustering of geographic location data
                            
                                Clustering Algorithm for Mapping Application
                            
                                differences in heatmap/clustering defaults in R (heatplot versus heatmap.2)?
                            
                                Spectral Clustering a graph in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

whats is the difference between "k means" and "fuzzy c means" objective functions?

Tags:

cluster-analysis

k-means

fuzzy-c-means

n0ob

People also ask

2 Answers

Reed Copsey

mjv

Recent Activity

Donate For Us