Given a set of distinct points in 2D space, and a rectangle (coordinates of all four points, sides parallel with xy axis) how can I quickly find which points are inside the rectangle?
I'm not interested in the basic solution of going through all points and seeing which one is inside the rectangle. What I'm looking for is an algorithm which will give me fast query times per rectangle.
I don't care how much time I spend in the preprocessing step. What I do care is that after I process my data I obtain a useful structure which gives me fast query times per rectangle.
I know for example I can count how many points I have inside a rectangle in O(logN). That works because I do a lot of heavy processing in the beginning and then query the processed data with a new rectangle every time and get a new count in logN time. I'm looking for a similar algorithm for finding the actual points not just their count.
In this post, we have discussed a new approach. Approach: If we observe carefully, It will be clear that for a point to be lie inside the rectangle, it should be lie inside the x-coordinate (x1, x2) of the rectangle as well as it should also lie inside the y-coordinate (y1, y2) of the rectangle.
Substitute (x1, y1) = B(4, -3) and (x2, y2) = D(-3, 2). ΔABC above satisfies Pythagorean Theorem, hence ΔABC is a right triangle with ∠A = 90°. Opposite sides are equal and it is proved that one of the vertices has right angle. So, the given four points form a rectangle.
The rectangular coordinate system consists of two real number lines that intersect at a right angle. The horizontal number line is called the x-axis, and the vertical number line is called the y-axis.
A classical answer is the kD-tree (2D-tree in this case).
For a simple alternative, if your points are spread uniformly enough, you can try by gridding.
Choose a cell size for a square grid (if the problem is anisotropic, use a rectangular grid). Assign every point to the grid cell that contains it, stored in a linked list. When you perform a query, find all cells that are overlapped by the rectangle and scan them to traverse their lists. For the partially covered cells, you will need to perform the point-in-rectangle test.
The choice of the size is important: too large can result in too many points needing to be tested anyway; too small can result in too many empty cells.
You are looking for kd-tree range search or range query.
O(n)
, but this worst case happens pretty often.All these algorithms run queries in average O(log n + k)
where k is the count of matched points.
Gridding, like Yves suggested, can perform range search in O(k)
time, but only when the size of the query rectangle is bounded. This is what they often do in particle simulations. Gridding can be used even when the input set is not bounded -- just make a fixed count of buckets based on hash of the grid coordinates. But if the query rectangle can be of arbitrary size, then gridding is a no-go.
You could group point in sectors. If a sector is completely in or out of given rectangle then all point within it are also in or out. If a sector is partially in then you have to search O(n) for points in that sector to check if they are in the rectangle. Look for k-d tree search.
Along with other answers, you can also look into Morton codes (z-order curve sorting).
In your case, that is static data, you can even represent the whole point data as an array.
https://en.wikipedia.org/wiki/Z-order_curve
This paper also have a rather complicated timeline of different "multi-dimentional access methods" --http://www.cc.gatech.edu/computing/Database/readinggroup/articles/p170-gaede.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With