I took the matlab
code from this tutorial Texture Segmentation Using Gabor Filters.
To test clustering
algorithms on the resulting multi-dimensional texture
responses to gabor filters
, I applied Gaussian Mixture
and Fuzzy C-means
instead of the K-means
to compare their results (number of clusters = 2 in all of the cases):
L = kmeans(X, 2, 'Replicates', 5);
options = statset('MaxIter',1000);
gmm = fitgmdist(X, 2, 'Options', options);
L = cluster(gmm, X);
[centers, U] = fcm(X, 2);
[values indexes] = max(U);
What I've found weird in this case is that K-means
clusters are more accurate than those extracted using GMM
and Fuzzy C-means
.
Can anyone explain to me if the high-dimensionality (L x W x 26
: 26 is the number of gabor filters
used) of the data given as input to the GMM
and the Fuzzy C-means
classifiers is what's causing the clustering to be less accurate?
In other words is the GMM
and the Fuzzy C-means
clustering more sensitive to the dimensionality of the data, than K-means
is?
If we compare both algorithms, the Gaussian mixtures seem to be more robust. However, GMs usually tend to be slower than K-Means because it takes more iterations of the EM algorithm to reach the convergence. They can also quickly converge to a local minimum that is not a very optimal solution.
The performance of GMM is better than that of K-means. The three clusters in GMM plot are closer to the original ones. Also, we compute the error rate (percentage of misclassified points) which should be the smaller the better. The Error rate of GMM is 0.0333, while that of K-means is 0.1067.
K-means has a higher bias than GMM (identity covariance matrix) because it is also a special case. K-mean specifically assumes the hard clustering problem, but GMM does not. Because of this, GMM has stronger estimates for the mean of the centroids.
By increasing the number of clustering centers, GMM will generate a more accurate segmentation result as shown in Fig. 4.8. The subplots (C) and (E) are the segmentation results obtained with three and four cluster centers.
Fuzzy c-means clustering has can be considered a better algorithm compared to the k-Means algorithm. Unlike the k-Means algorithm where the data points exclusively belong to one cluster, in the case of the fuzzy c-means algorithm, the data point can belong to more than one cluster with a likelihood.
Glad the comment was useful, here are my observations in answer form.
Each of these methods are sensitive to initialization, but k-means
is cheating by using 5 'Replicates'
and higher quality initialization (k-means++
). The rest of the methods appear to be using a single random initialization.
k-means
is GMM
if you force spherical covariance. So in theory, it shouldn't do much better (it might do slightly better if the true covariance was in fact spherical).
I think most of the discrepancy comes down to initialization. You should be able to test this by using the k-means result as initial conditions for the other algorithms. Or as you tried, run several times using different random seeds and check if there is more variation in GMM
and Fuzzy C-means
than there is in k-means
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With