Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimized approach to finding which pairs of numbers another number falls between?

Given a list of regions on a line:

regions = [(10,25), (18, 30), (45, 60), ...] # so on so forth, regions can be overlapping, of variable size, etc.

I want to know which regions a point X belongs to:

x = 23
find_regions(regions, x) # ==> [(10, 25), (18, 30)]

I know naively (and my current implementation) we can just search in O(n), but a more dramatic use case with thousands of regions (and thousands of look up points, really, is the motivator) justifies investigating a faster approach than this:

regions = [(start, end) for (start, end) in regions if start < x and x < end]

I would hazard a guess that someone has solved this problem before...but I'm not sure how it would be best accomplished. Thoughts?

like image 757
zaczap Avatar asked Nov 01 '22 05:11

zaczap


1 Answers

This is the exact job interval trees were designed to do. Googling Python interval tree turns up an existing library called Banyan that implements them, though I can't speak for its reliability, and it doesn't seem to be actively developed. You could also implement your own interval tree.

The preprocessing time to construct an interval tree from a list of N intervals is in O(Nlog(N)), and unlike some of the other answers, it only takes O(N) space, regardless of how much the intervals overlap. The time to figure out how many intervals overlap a given point is in O(M+log(N)), where M is the number of intervals containing the point.

Banyan interval tree demo, pulled from the PyPI page:

>>> t = SortedSet([(1, 3), (2, 4), (-2, 9)], updator = OverlappingIntervalsUpdator)
>>>
>>> print(t.overlap_point(-5))
[]
>>> print(t.overlap_point(5))
[(-2, 9)]
>>> print(t.overlap_point(3.5))
[(-2, 9), (2, 4)]
>>>
>>> print(t.overlap((-10, 10)))
[(-2, 9), (1, 3), (2, 4)]
like image 51
user2357112 supports Monica Avatar answered Nov 15 '22 03:11

user2357112 supports Monica