Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the index of closest data point to the centriods in Kmeans clustering in MATLAB

I am doing some clustering using K-means in MATLAB. As you might know the usage is as below:

[IDX,C] = kmeans(X,k)

where IDX gives the cluster number for each data point in X, and C gives the centroids for each cluster.I need to get the index(row number in the actual data set X) of the closest datapoint to the centroid. Does anyone know how I can do that? Thanks

like image 356
Hossein Avatar asked Dec 09 '10 15:12

Hossein


People also ask

What is cluster index in Kmeans?

k-means clustering is a partitioning method. The function kmeans partitions data into k mutually exclusive clusters and returns the index of the cluster to which it assigns each observation. kmeans treats each observation in your data as an object that has a location in space.

How do you find the optimal value of k in Kmeans?

There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.

How do you find the centroid of a cluster of points )?

To calculate the centroid from the cluster table just get the position of all points of a single cluster, sum them up and divide by the number of points.


2 Answers

The "brute-force approach", as mentioned by @Dima would go as follows

%# loop through all clusters
for iCluster = 1:max(IDX)
    %# find the points that are part of the current cluster
    currentPointIdx = find(IDX==iCluster);
    %# find the index (among points in the cluster)
    %# of the point that has the smallest Euclidean distance from the centroid
    %# bsxfun subtracts coordinates, then you sum the squares of
    %# the distance vectors, then you take the minimum
    [~,minIdx] = min(sum(bsxfun(@minus,X(currentPointIdx,:),C(iCluster,:)).^2,2));
    %# store the index into X (among all the points)
    closestIdx(iCluster) = currentPointIdx(minIdx);
end

To get the coordinates of the point that is closest to the cluster center k, use

X(closestIdx(k),:)
like image 52
Jonas Avatar answered Sep 19 '22 23:09

Jonas


The brute force approach would be to run k-means, and then compare each data point in the cluster to the centroid, and find the one closest to it. This is easy to do in matlab.

On the other hand, you may want to try the k-medoids clustering algorithm, which gives you a data point as the "center" of each cluster. Here is a matlab implementation.

like image 43
Dima Avatar answered Sep 20 '22 23:09

Dima