Is there a way to avoid the linear search on this?

Question

I have a large pool of objects with starting number and ending number. For example:

(999, 2333, data) 
(0, 128, data) 
(235, 865, data)
...

Assuming that the intervals don't overlap with each other. And I am writing a function that takes a number and locate the object that (low, high) contains it. Say given 333, I want the 3rd objects on the list.

Is there any way I can do this as efficiently as possible, short of linear search? I was thinking about binary search, but having some difficulties of coping with the range check.

amit · Accepted Answer

Think if it worth sorting the data.
If you only want to search a few times, then it doesn't - and you cannot avoid linear search. total complexity of your searches will be O(n*k), where n is the number of elements and k is the number of searches.

If you want to search a lot of times, then you should first sort, and then search using binary search. It will be O(nlogn) for sorting and O(klogn) for searching k times, so you get total of O((n+k)logn).

Thus, sorting and then searching should be done only if k>=logn

P.S. You can use another approach to sort and search, as proposed in other answers, in all ways, the conclusion remains: do it only if k>=logn

guettli · Answer

You can use the bisect module: http://docs.python.org/library/bisect.html

You need to sort your data once, then use bisect:

import bisect
data=None
tuples=[(0, 128, None), (235, 865, None), (999, 2333, None)]
tuples.sort()
print tuples
print bisect.bisect(tuples, (-1,))   # 0
print bisect.bisect(tuples, (1,))    # 1
print bisect.bisect(tuples, (333,))  # 2
print bisect.bisect(tuples, (3333,)) # 3

cyborg · Answer

If search speed is of utmost importance, then you can create a look-up table (as already commented by S.Lott). This will take O( r ) memory (where r is the size of the range), O( r ) pre-processing time, and O(1) search time. Create an array of the size of the range, and populate each element with a pointer to the data or null.

lookup = {}
for low, high, data in source_ranges:
    for i in range(low,high): # or maybe high+1 if the ranges are inclusive
        lookup[i] = data

Now the lookup is trivial.

lookup = {}
for low, high, data in source_ranges:
    for i in range(low,high): # or maybe high+1 if the ranges are inclusive
        lookup[i] = data

Now the lookup is trivial.

NPE · Answer

First of all, it is not at all clear that binary search is warranted here. It may well be that linear search is faster when the number of intervals is small.

If you're concerned about performance, the prudent thing to do is to profile the code, and perhaps benchmark both methods on your typical inputs.

Disclaimers aside, binary search could be implemented by sorting the intervals once, and then repeatedly using the bisect module to do the search:

import bisect

intervals = [(999, 2333, 'int1'), (0, 128, 'int2'), (235, 865, 'int3')]
intervals.sort()

def find_int(intervals, val):
   pos = bisect.bisect_left([interval[1] for interval in intervals], val)
   if pos < len(intervals) and val >= intervals[pos][0]:
      return intervals[pos]
   else:
      return None

print(find_int(intervals, 0))
print(find_int(intervals, 1))
print(find_int(intervals, 200))
print(find_int(intervals, 998))
print(find_int(intervals, 999))
print(find_int(intervals, 1000))
print(find_int(intervals, 2333))
print(find_int(intervals, 2334))

In the above, I assume that the intervals are non-overlapping, and that the interval includes both its start and end points.

Lastly, to improve performance, one might consider factoring [interval[1] for interval in intervals] out of the function and doing it just once at the start.

Is there a way to avoid the linear search on this?

Tags:

python

algorithm

Oliver

4 Answers

amit

guettli

cyborg

NPE

Recent Activity

Donate For Us

Is there a way to avoid the linear search on this?

Tags:

python

algorithm

Oliver

4 Answers

amit

guettli

cyborg

NPE

Related questions

Recent Activity

Donate For Us