Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which spatial data structure (algorithm) is best for (searching in) a set of regions (spacial data)?

I have a set of regions (geo-fences) which are polygons. This set of data is fixed; so there is no need for insertion and deletion of data. Which data structure can be used for searching for regions that a query point (longitude, latitude) is in it?

Note: I have implemented KD-Tree (In fact a 2D-Tree) successfully for a set of points. But it does not work for this problem. I have implemented an R-Tree then; and it solves the problem but it is slow (or my implementation sucks).

Thank you

Note: I have worked on R-Tree implementation and it works fine now.

like image 513
Kaveh Shahbazian Avatar asked Oct 10 '22 13:10

Kaveh Shahbazian


2 Answers

Since you are not inserting/deleting and presumably have plenty of time to preprocess the data, you can use some additional memory to speed up computations. The basic idea for pre-processing:

  1. Take all of the polygon points and determine the smallest axis-aligned bounding rectangle that contains them all; basically this is the min and max of X and Y.
  2. Choose a partioning factor dX and dY that you will use to create a search grid. Choosing powers of two for the partioning factors can make for slightly faster computation later.
  3. Translate the polygon data so that their bounding rectangle minimum is coincident with (0,0) and expand the rectangle so that it is an integer multiple of the partitioning factor in each dimension.
  4. Consider each grid square and make a list of the polygons that intersect the square. Store this list for each grid square. Depending on the nature of the data (how many polygons you can ever expect to intersect a square), there are various ways you can optimize this for either storage space or speed.

Now, when you want to find regions that contain a point:

  1. Translate the point using the origin we defined earlier and determine the grid square containing the point (if you used a power of two, this is a shift operation; otherwise it's division.)
  2. Look at the list for the grid square. If it's empty, there is no containing polygon. If not, you have to consider each of the polygons in the list and search for intersection.

This works well for spread out and mostly non-intersecting polygons, particularly if you can choose a grid size fine enough so that there are only a few polygons per square. It will be slow in cases where you hit squares with lots of intersecting polygons. One additional optimization is to have a flag for each listed polygon at a square to indicate that the square is completely contained within the polygon; this allows you to avoid the slow containment test in many cases at the cost of a single bit per polygon entry. This is particularly valuable if your grid spacing is fine compared to the polygon sizes, as most squares will not be at intersections or edges.

If you need even more speed, you can start storing edge information at each square with the polygon reference. You only need to test against the polygon edges that actually intersect the area of the square. This can reduce the effort to only a handful of edge tests per polygon.

like image 174
Dan Bryant Avatar answered Oct 13 '22 10:10

Dan Bryant


A R-Tree data structure can be used for this problem.

like image 36
Kaveh Shahbazian Avatar answered Oct 13 '22 12:10

Kaveh Shahbazian