I read that question about how to use bisect
on a list of tuples, and I used that information to answer that question. It works, but I'd like a more generic solution.
Since bisect
doesn't allow to specify a key
function, if I have this:
import bisect
test_array = [(1,2),(3,4),(5,6),(5,7000),(7,8),(9,10)]
and I want to find the first item where x > 5
for those (x,y)
tuples (not considering y
at all, I'm currently doing this:
bisect.bisect_left(test_array,(5,10000))
and I get the correct result because I know that no y
is greater than 10000, so bisect
points me to the index of (7,8)
. Had I put 1000
instead, it would have been wrong.
For integers, I could do
bisect.bisect_left(test_array,(5+1,))
but in the general case when there may be floats, how to to that without knowing the max values of the 2nd element?
test_array = [(1,2),(3,4),(5.2,6),(5.2,7000),(5.3,8),(9,10)]
I have tried this:
bisect.bisect_left(test_array,(min_value+sys.float_info.epsilon,))
and it didn't work, but I have tried this:
bisect.bisect_left(test_array,(min_value+sys.float_info.epsilon*3,))
and it worked. But it feels like a bad hack. Any clean solutions?
bisect
supports arbitrary sequences. If you need to use bisect
with a key, instead of passing the key to bisect
, you can build it into the sequence:
class KeyList(object):
# bisect doesn't accept a key function, so we build the key into our sequence.
def __init__(self, l, key):
self.l = l
self.key = key
def __len__(self):
return len(self.l)
def __getitem__(self, index):
return self.key(self.l[index])
Then you can use bisect
with a KeyList
, with O(log n) performance and no need to copy the bisect
source or write your own binary search:
bisect.bisect_right(KeyList(test_array, key=lambda x: x[0]), 5)
This is a (quick'n'dirty) bisect_left implementation that allows an arbitrary key function:
def bisect(lst, value, key=None):
if key is None:
key = lambda x: x
def bis(lo, hi=len(lst)):
while lo < hi:
mid = (lo + hi) // 2
if key(lst[mid]) < value:
lo = mid + 1
else:
hi = mid
return lo
return bis(0)
> from _operator import itemgetter
> test_array = [(1, 2), (3, 4), (4, 3), (5.2, 6), (5.2, 7000), (5.3, 8), (9, 10)]
> print(bisect(test_array, 5, key=itemgetter(0)))
3
This keeps the O(log_N)
performance up since it does not assemble a new list
of keys. The implementation of binary search is widely available, but this was taken straight from the bisect_left
source.
It should also be noted that the list needs to be sorted with regard to the same key function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With