Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the center of a cluster

I have the following problem - made abstract to bring out the key issues.

I have 10 points each which is some distance from the other. I want to

  1. be able to find the center of the cluster i.e. the point for which the pairwise distance to each other point is minimised,
    let p(j) ~ p(k) represent the pairwise distance beteen points j and k
    p(i) is center-point of the cluster iff p(i) s.t. min[sum(p(j)~p(k))] for all 0 < j,k <= n where we have n points in the cluster
  2. determine how to split the cluster in to two clusters once the number of data points in the cluster goes above some threshold t.

This is not euclidean space. But the distances can be summarised as follows - p(i) is point i:

       p(1)    p(2)    p(3)    p(4)    p(5)    p(6)    p(7)    p(8)    p(9)    p(10)
p(1)    0       2       1       3       2       3       3       2       3        4
p(2)    2       0       1       3       2       3       3       2       3        4
p(3)    1       1       0       2       0       1       2       1       2        3
p(4)    3       3       2       0       1       2       3       2       3        4      
p(5)    2       2       1       1       0       1       2       1       2        3   
p(6)    3       3       2       2       1       0       3       2       3        4   
p(7)    3       3       2       3       2       3       0       1       2        3  
p(8)    2       2       1       2       1       2       1       0       1        2 
p(9)    3       3       2       3       2       3       2       1       0        1
p(10)   4       4       3       4       3       4       3       2       1        0 

How would I calculate which is the center point of this cluster?

like image 270
Ankur Avatar asked Aug 10 '09 08:08

Ankur


2 Answers

As far as I understand this looks like K Means Clustering, and what you are looking for is usually known as 'Medoids'.

See here: http://en.wikipedia.org/wiki/Medoids or here: http://en.wikipedia.org/wiki/K-medoids

like image 83
Darknight Avatar answered Sep 22 '22 12:09

Darknight


I may be about to have that frisson that comes just before displaying utter stupidity. But doesn't this yield easily to brute force? In Python:

distances = [
[ 0 , 2 , 1 , 3 , 2 , 3 , 3 , 2 , 3 , 4 , ],
[ 2 , 0 , 1 , 3 , 2 , 3 , 3 , 2 , 3 , 4 , ],
[ 1 , 1 , 0 , 2 , 0 , 1 , 2 , 1 , 2 , 3 , ],
[ 3 , 3 , 2 , 0 , 1 , 2 , 3 , 2 , 3 , 4 , ],
[ 2 , 2 , 1 , 1 , 0 , 1 , 2 , 1 , 2 , 3 , ],
[ 3 , 3 , 2 , 2 , 1 , 0 , 3 , 2 , 3 , 4 , ],
[ 3 , 3 , 2 , 3 , 2 , 3 , 0 , 1 , 2 , 3 , ],
[ 2 , 2 , 1 , 2 , 1 , 2 , 1 , 0 , 1 , 2 , ],
[ 3 , 3 , 2 , 3 , 2 , 3 , 2 , 1 , 0 , 1 , ],
[ 4 , 4 , 3 , 4 , 3 , 4 , 3 , 2 , 1 , 0 , ],
]

currentMinimum = 99999

for point in range ( 10 ) :
    distance_sum = 0
    for second_point in range ( 10 ) :
        if point == second_point : continue
        distance_sum += distances [ point ] [ second_point ]
    print '>>>>>', point, distance_sum 

    if distance_sum < currentMinimum :
        currentMinimum = distance_sum 
        centre = point

print centre
like image 22
Bill Bell Avatar answered Sep 25 '22 12:09

Bill Bell