Agglomerative Clustering in Matlab

Tags:

I have a simple 2-dimensional dataset that I wish to cluster in an agglomerative manner (not knowing the optimal number of clusters to use). The only way I've been able to cluster my data successfully is by giving the function a 'maxclust' value.

For simplicity's sake, let's say this is my dataset:

X=[ 1,1;
    1,2;
    2,2;
    2,1;
    5,4;
    5,5;
    6,5;
    6,4 ];

Naturally, I would want this data to form 2 clusters. I understand that if I knew this, I could just say:

T = clusterdata(X,'maxclust',2);

and to find which points fall into each cluster I could say:

cluster_1 = X(T==1, :);

and

cluster_2 = X(T==2, :);

but without knowing that 2 clusters would be optimal for this dataset, how do I cluster these data?

Thanks

862

asked Nov 04 '11 22:11

Kevin_TA

2 Answers

The whole point of this method is that it represents the clusters found in a hierarchy, and it is up to you to determine how much details you want to get..

agglomerative dendogram

Think of this as having a horizontal line intersecting the dendrogram, which moves starting from 0 (each point is its own cluster) all the way to the max value (all points in one cluster). You could:

stop when you reach a predetermined number of clusters (example)
manually position it given a certain height value (example)
choose to place it where the clusters are too far apart according to the distance criterion (ie there's a big jump to the next level) (example)

This can be done by either using the 'maxclust' or 'cutoff' arguments of the CLUSTER/CLUSTERDATA functions

answered Oct 11 '22 20:10

Amro

To choose the optimal number of clusters, one common approach is to make a plot similar to a Scree Plot. Then you look for the "elbow" in the plot, and that is the number of clusters you pick. For the criterion here, we will use the within-cluster sum-of-squares:

function wss = plotScree(X, n)

wss = zeros(1, n);
wss(1) = (size(X, 1)-1) * sum(var(X, [], 1));
for i=2:n
    T = clusterdata(X,'maxclust',i);
    wss(i) = sum((grpstats(T, T, 'numel')-1) .* sum(grpstats(X, T, 'var'), 2));
end
hold on
plot(wss)
plot(wss, '.')
xlabel('Number of clusters')
ylabel('Within-cluster sum-of-squares')

>> plotScree(X, 5)

ans =

   54.0000    4.0000    3.3333    2.5000    2.0000

enter image description here

answered Oct 11 '22 20:10

John Colby

Related questions
                            
                                SPMD vs. Parfor
                            
                                Using clearvars correctly in MATLAB
                            
                                concatenate vectors of an cell array in matlab
                            
                                find data between range in matlab
                            
                                Random permutation matrix
                            
                                Print in Command Window without 'ans = ' in matlab?
                            
                                Why is realmin > eps(0)?
                            
                                How to assign a value to specific locations of a matrix in MATLAB?
                            
                                Normalized cuts with Matlab 2013a
                            
                                How to display legend in bottom right corner instead of top right?
                            
                                Matlab - Transpose a 3D matrix only in the third dimension
                            
                                How to zoom in/out in Matlab editor?
                            
                                Remove zeros column and rows from a matrix matlab
                            
                                determine if array contains specific integer in octave
                            
                                How to customize App Designer figures in more ways than officially documented?
                            
                                2-D line gradient color in Matlab
                            
                                -bash: matlab: command not found
                            
                                How to calculate a rotation matrix in n dimensions given the point to rotate, an angle of rotation and an axis of rotation (n-2 subspace)
                            
                                Reading text values into matlab variables from ASCII files
                            
                                How can I create a barseries plot using both grouped and stacked styles in MATLAB?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Agglomerative Clustering in Matlab

Tags:

classification

matlab

cluster-analysis

dendrogram

Kevin_TA

People also ask

2 Answers

Amro

John Colby

Recent Activity

Donate For Us