Kmeans matlab "Empty cluster created at iteration 1" error

Question

I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using:

[G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample');

XX can be found in this link XX value and the K is set to 3 So if anyone could please advise me why this is happening.

Amro · Accepted Answer

It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is usually caused by an inadequate cluster initialization, or that the data has less inherent clusters than you specified.

Try changing the initialization method using the start option. Kmeans provides four possible techniques to initialize clusters:

sample: sample K points randomly from the data as initial clusters (default)
uniform: select K points uniformly across the range of the data
cluster: perform preliminary clustering on a small subset
manual: manually specify initial clusters

Also you can try the different values of emptyaction option, which tells MATLAB what to do when a cluster becomes empty.

Ultimately, I think you need to reduce the number of clusters, i.e try K=2 clusters.

I tried to visualize your data to get a feel for it:

load matlab_X.mat
figure('renderer','zbuffer')
line(XX(:,1), XX(:,2), XX(:,3), ...
    'LineStyle','none', 'Marker','.', 'MarkerSize',1)
axis vis3d; view(3); grid on

After some manual zooming/panning, it looks like a silhouette of a person:

3d_points

You can see that the data of 307200 points is really dense and compact, which confirms what I suspected; the data doesnt have that many clusters.

Here is the code I tried:

>> [IDX,C] = kmeans(XX, 3, 'start','uniform', 'emptyaction','singleton');
>> tabulate(IDX)
  Value    Count   Percent
      1    18023      5.87%
      2    264690     86.16%
      3    24487      7.97%

Whats more, the entire points in cluster 2 are all duplicate points ([0 0 0]):

>> unique(XX(IDX==2,:),'rows')
ans =
     0     0     0

The other two clusters look like:

clr = lines(max(IDX));
for i=1:max(IDX)
line(XX(IDX==i,1), XX(IDX==i,2), XX(IDX==i,3), ...
    'Color',clr(i,:), 'LineStyle','none', 'Marker','.', 'MarkerSize',1)
end

clustered points

So you might get better clusters if you first remove duplicate points first...

In addition, you have a few outliers that might affect the result of clustering. Visually, I narrowed down the range of the data to the following intervals which encompasses most of the data:

>> xlim([-500 100])
>> ylim([-500 100])
>> zlim([900 1500])

Here is the result after removing dupe points (over 250K points) and outliers (around 250 data points), and clustering with K=3 (best of out of 5 runs with the replicates option):

XX = unique(XX,'rows');
XX(XX(:,1) < -500 | XX(:,1) > 100, :) = [];
XX(XX(:,2) < -500 | XX(:,2) > 100, :) = [];
XX(XX(:,3) < 900 | XX(:,3) > 1500, :) = [];

[IDX,C] = kmeans(XX, 3, 'replicates',5);

with almost an equal split across the three clusters:

>> tabulate(IDX)
  Value    Count   Percent
      1    15605     36.92%
      2    15048     35.60%
      3    11613     27.48%

Recall that the default distance function is euclidean distance, which explains the shape of the formed clusters.

final clustering

FSH · Answer

If you are confident with your choice of "k=3", here is the code I wrote for not getting an empty cluster:

[IDX,C] = kmeans(XX,3,'distance','cosine','start','sample', 'emptyaction','singleton');

while length(unique(IDX))<3 ||  histc(histc(IDX,[1 2 3]),1)~=0
% i.e. while one of the clusters is empty -- or -- we have one or more clusters with only one member
[IDX,C] = kmeans(XX,3,'distance','cosine','start','sample', 'emptyaction','singleton');
end

Kmeans matlab "Empty cluster created at iteration 1" error

Tags:

matlab

cluster-analysis

k-means

Tak

2 Answers

Amro

FSH

Recent Activity

Donate For Us

Kmeans matlab "Empty cluster created at iteration 1" error

Tags:

matlab

cluster-analysis

k-means

Tak

2 Answers

Amro

FSH

Related questions

Recent Activity

Donate For Us