Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matlab - Gaussian mixture and Fuzzy C-means less accurate than K-means on high-dimensional data (image of 26-dimension vectors)

I took the matlab code from this tutorial Texture Segmentation Using Gabor Filters.

To test clustering algorithms on the resulting multi-dimensional texture responses to gabor filters, I applied Gaussian Mixture and Fuzzy C-means instead of the K-means to compare their results (number of clusters = 2 in all of the cases):

Original image:

Original image

K-means clusters:

L = kmeans(X, 2, 'Replicates', 5);

kmeans

GMM clusters:

options = statset('MaxIter',1000);
gmm = fitgmdist(X, 2, 'Options', options);
L = cluster(gmm, X);

gmm

Fuzzy C-means:

[centers, U] = fcm(X, 2);
[values indexes] = max(U);

Fuzzy C-means

What I've found weird in this case is that K-means clusters are more accurate than those extracted using GMM and Fuzzy C-means.

Can anyone explain to me if the high-dimensionality (L x W x 26: 26 is the number of gabor filters used) of the data given as input to the GMM and the Fuzzy C-means classifiers is what's causing the clustering to be less accurate?

In other words is the GMM and the Fuzzy C-means clustering more sensitive to the dimensionality of the data, than K-means is?

like image 232
Hakim Avatar asked Nov 19 '15 00:11

Hakim


People also ask

Is Gaussian mixture model better than K-Means?

If we compare both algorithms, the Gaussian mixtures seem to be more robust. However, GMs usually tend to be slower than K-Means because it takes more iterations of the EM algorithm to reach the convergence. They can also quickly converge to a local minimum that is not a very optimal solution.

Is GMM always better than K-Means?

The performance of GMM is better than that of K-means. The three clusters in GMM plot are closer to the original ones. Also, we compute the error rate (percentage of misclassified points) which should be the smaller the better. The Error rate of GMM is 0.0333, while that of K-means is 0.1067.

Why does K mean have a higher bias when compared to the Gaussian of mixture model?

K-means has a higher bias than GMM (identity covariance matrix) because it is also a special case. K-mean specifically assumes the hard clustering problem, but GMM does not. Because of this, GMM has stronger estimates for the mean of the centroids.

How can I improve my accuracy GMM?

By increasing the number of clustering centers, GMM will generate a more accurate segmentation result as shown in Fig. 4.8. The subplots (C) and (E) are the segmentation results obtained with three and four cluster centers.

Why is fuzzy C-means better than K-Means?

Fuzzy c-means clustering has can be considered a better algorithm compared to the k-Means algorithm. Unlike the k-Means algorithm where the data points exclusively belong to one cluster, in the case of the fuzzy c-means algorithm, the data point can belong to more than one cluster with a likelihood.


1 Answers

Glad the comment was useful, here are my observations in answer form.

Each of these methods are sensitive to initialization, but k-means is cheating by using 5 'Replicates' and higher quality initialization (k-means++). The rest of the methods appear to be using a single random initialization.

k-means is GMM if you force spherical covariance. So in theory, it shouldn't do much better (it might do slightly better if the true covariance was in fact spherical).

I think most of the discrepancy comes down to initialization. You should be able to test this by using the k-means result as initial conditions for the other algorithms. Or as you tried, run several times using different random seeds and check if there is more variation in GMM and Fuzzy C-means than there is in k-means.

like image 76
kmac Avatar answered Oct 02 '22 07:10

kmac