Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spatial data structure for finding all points greater than or less than a value in each cartesian dimension

I am currently working on an optimization problem that requires finding all points greater than (or in some cases less than) a particular point in all cardinal directions. For example, in 2D, I might need to find all points that satisfy the condition:

x > x* and y < y* 
for an arbitrary point (x*,y*)

(e.g.- if the blue point in the plot below is (x*,y*), I need all the points in the box defined by the blue dashed lines).

Note: I need this to be an N-Dimensional structure/search as my actual optimization problem has more than 2 objectives which must be solved for. A typical search space would be on the order of 1000-5000 points, and would have 2 to 5 dimensions.

Is there any particular data structure well suited to this purpose? In the past I have used kd-trees to find nearest neighbors, and all points within a radius, however in this case I need a directional search. It looks like some form of an R-Tree might do the trick, where my search rectangle would go from x*,y* to some largely positive and largely negative values respectively. Is there a better data structure specific to this kind of search?

example plot

like image 629
MarkD Avatar asked Mar 11 '14 23:03

MarkD


People also ask

What are spatial data structure?

Spatial data structures store data objects organized by position and are an important class of data structures used in geographic information systems, computer graphics, robotics, and many other fields. A number of spatial data structures are used for storing point data in two or more dimensions.

What is data structure in geography?

A geographic data structure is a MATLAB® structure array that has one element per geographic feature. Each feature is represented by coordinates and attributes.


1 Answers

From what I've read from your comments, you are given a set of points P1, ..., Pn and you want to find for every point Pi = (x1, ..., xd) the number of points Pj = (y1, ..., yd) with xk Rk yk for all k, where Rk is one of (<, >).

One observation is that you can sort the points by their first coordinate and add them to your data structure in the order in which they become visible with regard to that coordinate. This removes one dimension from your search problem, so in the d-dimensional case, your problem now reduces to a (d-1)-dimensional orthogonal range query problem. This optimization is totally independent of how you actually solve the orthogonal range queries and by itself might make a brute-force implementation more viable.

Since your point set is more or less static (you can build a data structure on the complete set of points, initially set counters of all nodes to zero and then "insert" points by enabling them), you can use a range tree to solve the orthogonal range queries. In the trivial implementation, this gives you O(n logd-2 n) preprocessing and O(logd-2 n) query time, with a slightly enhanced version of the data structure

So in total the cost for one phase would then be O(n logd-2 n). Since your n is so small, I'm not sure whether this would actually buy you a lot (it's probably not sensible for the 5-d case, maybe not even for 4-d).

In case you want to go that route, I can highly recommend Erik Demaine's data structures lectures, which cover orthogonal range queries (in L03 and L04). He also covers the conceptionally slightly simpler half-open queries that you need, so maybe that can be implemented with a lower constant factor.

The original range tree data structure is unfortunately completely static, but there exist dynamized variants, so maybe you can even reuse trees between phases with many shared points and actually make use of variants with better query time. You can search for "dynamic orthogonal range query" for more information, there are a couple of papers about the subject. I doubt that this makes sense for your problem scale, though.

If your points are sufficiently randomly distributed, you can probably get away with partitioning the points into O(n0.5) rectangular buckets. You can count the number of points in a bucket in O(1) and the number of points within a bucket in O(n0.5) and the implementation has a very low overhead compared to range trees. Combined with the sorting by one coordinate this should give you decent speedups. This is definitely what I would try first.

like image 191
Niklas B. Avatar answered Sep 30 '22 15:09

Niklas B.