Given a list of X,Y coordinate points on a 2D grid, what is the most efficient algorithm to create a list of groups of adjacent coordinate points?
For example, given a list of points making up two non-adjacent squares (3x3) on a grid (15x15), the result of this algorithm would be two groups of points corresponding to the two squares.
I suppose you could do a flood fill algorithm, but this seems overkill and not very efficient for a large 2D array of say 1024 size.
Fundamentally, this is an image processing operation. If you use an image processing library like scikit-image (a.k.a. skimage
), it will be easy. Dealing with really huge data will eventually get slow, but 1024x1024 is nothing.
In [1]: import numpy as np
In [2]: import skimage.morphology
In [3]: x = [0,1,2,0,1,2,0,1,2,-3,-2,-1,-3,-2,-1,-3,-2,-1]
In [4]: y = [0,0,0,1,1,1,2,2,2,-3,-3,-3,-2,-2,-2,-1,-1,-1]
In [5]: dense = np.zeros((9,9), dtype=bool)
In [6]: dense[y,x] = True
In [7]: print(dense)
[[ True True True False False False False False False]
[ True True True False False False False False False]
[ True True True False False False False False False]
[False False False False False False False False False]
[False False False False False False False False False]
[False False False False False False False False False]
[False False False False False False True True True]
[False False False False False False True True True]
[False False False False False False True True True]]
In [8]: labeled = skimage.morphology.label(dense)
In [9]: print(labeled)
[[1 1 1 0 0 0 0 0 0]
[1 1 1 0 0 0 0 0 0]
[1 1 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 2 2 2]
[0 0 0 0 0 0 2 2 2]
[0 0 0 0 0 0 2 2 2]]
In [10]: coords_yx = { i: (labeled == i).nonzero() for i in range(1,labeled.max()+1) }
In [11]: coords_yx
Out[11]:
{1: (array([0, 0, 0, 1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2, 0, 1, 2])),
2: (array([6, 6, 6, 7, 7, 7, 8, 8, 8]), array([6, 7, 8, 6, 7, 8, 6, 7, 8]))}
You can hash all coordinate points (e.g. using dictionary structure in python) and then for each coordinate point, hash the adjacent neighbors of the point to find pairs of points that are adjacent and "merge" them. Also, for each point you can maintain a pointer to the connected component that that point belongs to (using the dictionary structure), and for each connected component you maintain a list of points that belong to the component.
Then, when you hash a neighbor of a point and find a match, you merge the two connected component sets that the points belong to and update the group pointers for all new points in the union set. You can show that you only need to hash all the neighbors of all points just once and this will find all connected components, and furthermore, if you update the pointers for the smaller of the two connected components sets when two connected component sets are merged, then the run-time will be linear in the number of points.
It's unclear what you mean by "groups of adjacent" coordinate points. Your example of two non-adjacent 3x3 squares suggests you are looking for what is called connected components labeling.
There are many implementations to extract connected components. Below are a few for guidance.
However, I've implemented this kind of blob detector and they are not that hard to write up if you are looking for a learning experience. If not, then I would go with the most mature library like OpenCV and use their Python API if that's all you need.
Also, you mentioned "efficiency". Note that there are single-pass and double-pass version of these algorithms. Single-pass, as the name suggests, is generally more efficient as it only requires a single pass through our data. This might be needed if your grids are very large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With