Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding clusters of mass in a matrix/bitmap

This is in continuation with the question posted here: Finding the center of mass on a 2D bitmap which talked about finding the center of mass in a boolean matrix, as the example was given.

Suppose now we expand the matrix to this form:

0 1 2 3 4 5 6 7 8 9
1 . X X . . . . . .
2 . X X X . . X . .
3 . . . . . X X X .
4 . . . . . . X . .
5 . X X . . . . . .
6 . X . . . . . . .
7 . X . . . . . . .
8 . . . . X X . . .
9 . . . . X X . . .

As you can see we now have 4 centers of mass, for 4 different clusters.

We already know how to find a center of mass given that only one exists, if we run that algorithm on this matrix we'll get some point in the middle of the matrix which does not help us.

What can be a good, correct and fast algorithm to find these clusters of mass?

like image 654
Yuval Adam Avatar asked Jan 04 '09 22:01

Yuval Adam


1 Answers

I think I would check each point in the matrix and figure out it's mass based on it's neighbours. The mass for points would fall with say the square of the distance. You could then pick the top four points with a minimum distance from each other.

Here's some Python code I whipped together to try to illustrate the approach for finding out the mass for each point. Some setup using your example matrix:

matrix = [[1.0 if x == "X" else 0.0 for x in y] for y in """.XX......
.XXX..X..
.....XXX.
......X..
.XX......
.X.......
.X.......
....XX...
....XX...""".split("\n")]

HEIGHT = len(matrix)
WIDTH = len(matrix[0])
Y_RADIUS = HEIGHT / 2
X_RADIUS = WIDTH / 2

To calculate the mass for a given point:

def distance(x1, y1, x2, y2):
  'Manhattan distance http://en.wikipedia.org/wiki/Manhattan_distance'
  return abs(y1 - y2) + abs(x1 - x2)

def mass(m, x, y):
  _mass = m[y][x]
  for _y in range(max(0, y - Y_RADIUS), min(HEIGHT, y + Y_RADIUS)):
    for _x in range(max(0, x - X_RADIUS), min(WIDTH, x + X_RADIUS)):
      d = max(1, distance(x, y, _x, _y))
      _mass += m[_y][_x] / (d * d)
  return _mass

Note: I'm using Manhattan distances (a k a Cityblock, a k a Taxicab Geometry) here because I don't think the added accuracy using Euclidian distances is worth the cost of calling sqrt().

Iterating through our matrix and building up a list of tuples like (x, y, mass(x,y)):

point_mass = []
for y in range(0, HEIGHT):
  for x in range(0, WIDTH):
    point_mass.append((x, y, mass(matrix, x, y)))

Sorting the list on the mass for each point:

from operator import itemgetter
point_mass.sort(key=itemgetter(2), reverse=True)

Looking at the top 9 points in that sorted list:

(6, 2, 6.1580555555555554)
(2, 1, 5.4861111111111107)
(1, 1, 4.6736111111111107)
(1, 4, 4.5938888888888885)
(2, 0, 4.54)
(4, 7, 4.4480555555555554)
(1, 5, 4.4480555555555554)
(5, 7, 4.4059637188208614)
(4, 8, 4.3659637188208613)

If we would work from highest to lowest and filter away points that are too close to already seen points we'll get (I'm doing it manually since I've run out of time now to do it in code...):

(6, 2, 6.1580555555555554)
(2, 1, 5.4861111111111107)
(1, 4, 4.5938888888888885)
(4, 7, 4.4480555555555554)

Which is a pretty intuitive result from just looking at your matrix (note that the coordinates are zero based when comparing with your example).

like image 155
PEZ Avatar answered Sep 29 '22 11:09

PEZ