Given a list of regions on a line:
regions = [(10,25), (18, 30), (45, 60), ...] # so on so forth, regions can be overlapping, of variable size, etc.
I want to know which regions a point X belongs to:
x = 23
find_regions(regions, x) # ==> [(10, 25), (18, 30)]
I know naively (and my current implementation) we can just search in O(n), but a more dramatic use case with thousands of regions (and thousands of look up points, really, is the motivator) justifies investigating a faster approach than this:
regions = [(start, end) for (start, end) in regions if start < x and x < end]
I would hazard a guess that someone has solved this problem before...but I'm not sure how it would be best accomplished. Thoughts?
This is the exact job interval trees were designed to do. Googling Python interval tree
turns up an existing library called Banyan that implements them, though I can't speak for its reliability, and it doesn't seem to be actively developed. You could also implement your own interval tree.
The preprocessing time to construct an interval tree from a list of N intervals is in O(Nlog(N)), and unlike some of the other answers, it only takes O(N) space, regardless of how much the intervals overlap. The time to figure out how many intervals overlap a given point is in O(M+log(N)), where M is the number of intervals containing the point.
Banyan interval tree demo, pulled from the PyPI page:
>>> t = SortedSet([(1, 3), (2, 4), (-2, 9)], updator = OverlappingIntervalsUpdator)
>>>
>>> print(t.overlap_point(-5))
[]
>>> print(t.overlap_point(5))
[(-2, 9)]
>>> print(t.overlap_point(3.5))
[(-2, 9), (2, 4)]
>>>
>>> print(t.overlap((-10, 10)))
[(-2, 9), (1, 3), (2, 4)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With