Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python binary search-like function to find first number in sorted list greater than a specific value

I'm trying to write a function in Python that finds the first number in a sorted list greater than a specific value that I pass in as an argument. I've found examples online that use simple list comprehensions to achieve this, but for my purposes I need to be performing this operation frequently and on large lists, so a search that runs in linear time is too expensive.

I've had a crack at writing an iterative binary search-like function to achieve this, though I'm coming across some edge cases where it doesn't work correctly. By the way, the function is not required to deal with a case where there is no larger item in the list. Here is my existing function:

def findFirstLarger(num, sortedList):
    low = 0; 
    high = len(sortedList) - 1

    mid = -1
    while True:
        print("low: " + str(low) + "\t high: " + str(high))
        if (low > high):
            print("Ah geez, low is " + str(low) + " and high is " + str(high))
            return # debugging, don't want this to happen
        if low == high:
            return sortedList[low]
        else:
            mid = (low + high) / 2;
            if num == sortedList[mid]:
                return sortedList[mid]
            elif num > sortedList[mid]:
                low = mid + 1
            else:
                high = mid - 1

One case I have noted where this function does not work is as follows:

>>> somenumbers=[n*2 for n in range(131072)]
>>> somenumbers[-5:]
[262134, 262136, 262138, 262140, 262142]


>>> binsearch.findFirstLarger(262139,somenumbers)
low: 0   high: 131071
low: 65536   high: 131071
low: 98304   high: 131071
low: 114688  high: 131071
low: 122880  high: 131071
low: 126976  high: 131071
low: 129024  high: 131071
low: 130048  high: 131071
low: 130560  high: 131071
low: 130816  high: 131071
low: 130944  high: 131071
low: 131008  high: 131071
low: 131040  high: 131071
low: 131056  high: 131071
low: 131064  high: 131071
low: 131068  high: 131071
low: 131070  high: 131071
low: 131070  high: 131069
Ah geez, low is 131070 and high is 131069

Here the correct result would be 262140, as this is the first number in the list greater than 262139.

Can anyone recommend a cleaner implementation of this that actually works? I didn't think this would be such an esoteric problem, though I haven't been able to find a solution anywhere as of yet.

like image 946
Bryce Thomas Avatar asked Nov 29 '22 18:11

Bryce Thomas


1 Answers

Have you tried the bisect module?

def find_ge(a, key):
    '''Find smallest item greater-than or equal to key.
    Raise ValueError if no such item exists.
    If multiple keys are equal, return the leftmost.

    '''
    i = bisect_left(a, key)
    if i == len(a):
        raise ValueError('No item found with key at or above: %r' % (key,))
    return a[i]

find_ge(somenumbers, 262139)

Your code is wrong that (1) low > high is a valid termination case. (2) you should not stop at low == high, e.g. it will return an incorrect index when num == 3 for your somenumbers.

like image 156
kennytm Avatar answered Dec 06 '22 17:12

kennytm