Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating Clusters in matlab

Suppose that I have generated some data in matlab as follows:

n = 100;

x = randi(n,[n,1]);
y = rand(n,1);
data = [x y];

plot(x,y,'rx')
axis([0 100 0 1])

Now I want to generate an algorithm to classify all these data into some clusters(which are arbitrary) in a way such that a point be a member of a cluster only if the distance between this point and at least one of the members of the cluster be less than 10.How could I generate the code?

like image 236
MMd.NrC Avatar asked Mar 02 '23 23:03

MMd.NrC


2 Answers

The clustering method you are describing is DBSCAN. Note that this algorithm will find only one cluster in provided data, since it's very unlikely that there is a point in the dataset so that its distance to all other points is more than 10. If this is really what you want, you can use ِDBSCAN, or the one posted in FE, if you are using versions older than 2019a.

% Generating random points, almost similar to the data provided by OP 
data = bsxfun(@times, rand(100, 2), [100 1]);
% Adding more random points
for i=1:5
    mu = rand(1, 2)*100 -50;
    A = rand(2)*5;
    sigma = A*A'+eye(2)*(1+rand*2);%[1,1.5;1.5,3];
    data = [data;mvnrnd(mu,sigma,20)];
end
% clustering using DBSCAN, with epsilon = 10, and min-points = 1 as 
idx = DBSCAN(data, 10, 1);
% plotting clusters
numCluster = max(idx);
colors = lines(numCluster);
scatter(data(:, 1), data(:, 2), 30, colors(idx, :), 'filled')
title(['No. of Clusters: ' num2str(numCluster)])
axis equal

enter image description here

The numbers in above figure shows the distance between closest pairs of points in any two different clusters.

like image 124
saastn Avatar answered Mar 11 '23 03:03

saastn


The Matlab built-in function clusterdata() works well for what you're asking.

Here is how to apply it to your example:

% number of points
n = 100; 

% create the data
x = randi(n,[n,1]);
y = rand(n,1);
data = [x y]; 

% the number of clusters you want to create
num_clusters = 5; 

T1 = clusterdata(data,'Criterion','distance',...
'Distance','euclidean',...
'MaxClust', num_clusters)

scatter(x, y, 100, T1,'filled')

In this case, I used 5 clusters and used the Euclidean distance to be the metric to group the data points, but you can always change that (see documentation of clusterdata())

See the result below for 5 clusters with some random data.

enter image description here

Note that the data is skewed (x-values are from 0 to 100, and y-values are from 0 to 1), so the results are also skewed, but you could always normalize your data.

like image 22
juju89 Avatar answered Mar 11 '23 01:03

juju89