Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordering clustered points using Kmeans and R

I have set of data (of 5000 points with 4 dimensions) that I have clustered using kmeans in R.

I want to order the points in each cluster by their distance to the center of that cluster.

Very simply, the data looks like this (I am using a subset to test out various approaches):

id  Ans Acc Que Kudos
1   100 100 100 100
2   85  83  80  75
3   69  65  30  29
4   41  45  30  22 
5   10  12  18  16
6   10  13  10  9
7   10  16  16  19
8   65  68  100 100
9   36  30  35  29
10  36  30  26  22

Firstly, I used the following method to cluster the dataset into 2 clusters:

(result <- kmeans(data, 2))

This returns a kmeans object that has the following methods: cluster, centers etc.

But I cannot figure out how to compare each point and produce an ordered list.

Secondly, I tried the seriation approach as suggested by another SO user here

I use these commands:

clus <- kmeans(scale(x, scale = FALSE), centers = 3, iter.max = 50, nstart = 10)
mns <- sapply(split(x, clus$cluster), function(x) mean(unlist(x)))
result <- dat[order(order(mns)[clus$cluster]), ]

Which seems to produce an ordered list but if I bind it to the labeled clusters (using the following cbind command):

result <- cbind(x[order(order(mns)[clus$cluster]), ],clus$cluster)

I get the following result, which does not appear to be ordered correctly:

id  Ans Acc Que Kudos   clus
1   3   69  65  30  29  1
2   4   41  45  30  22  1
3   5   10  12  18  16  2
4   6   10  13  10  9   2
5   7   10  16  16  19  2
6   9   36  30  35  29  2
7   10  36  30  26  22  2
8   1   100 100 100 100 1
9   2   85  83  80  75  2
10  8   65  68  100 100 2

I don't want to be writing commands willy-nilly but understand how the approach works. If anyone could help out or spread some light on this, it would be really great.

EDIT:::::::::::

As the clusters can be easily plotted, I'd imagine there is a more straightforward way to get and rank the distances between points and the center.

The centers for the above clusters (when using k = 2) are as follows. But I do not know how to get and compare this with each individual point.

     Ans    Accep     Que      Kudos
1 83.33333 83.66667 93.33333 91.66667
2 30.28571 30.14286 23.57143 20.85714 

NB::::::::

I don't need top use kmeans but I want to specify the number of clusters and retrieve an ordered list of points from those clusters.

like image 354
slotishtype Avatar asked Apr 09 '12 14:04

slotishtype


1 Answers

Here is an example that does what you ask, using the first example from ?kmeans. It is probably not terribly efficient, but is something to build upon.

#Taken straight from ?kmeans
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
cl <- kmeans(x, 2)

x <- cbind(x,cl = cl$cluster)

#Function to apply to each cluster to 
# do the ordering
orderCluster <- function(i,data,centers){
    #Extract cluster and center
dt <- data[data[,3] == i,]
ct <- centers[i,]

    #Calculate distances
dt <- cbind(dt,dist = apply((dt[,1:2] - ct)^2,1,sum))
    #Sort
dt[order(dt[,4]),]
}

do.call(rbind,lapply(sort(unique(cl$cluster)),orderCluster,data = x,centers = cl$centers))
like image 173
joran Avatar answered Oct 13 '22 09:10

joran