I want to search a sorted list of strings for all of the elements that start with a given substring.
Here's an example that finds all of the exact matches:
import bisect
names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
names.sort()
leftIndex = bisect.bisect_left(names, 'bob')
rightIndex = bisect.bisect_right(names, 'bob')
print(names[leftIndex:rightIndex])
Which prints ['bob', 'bob', 'bob']
.
Instead, I want to search for all the names that start with 'bob'. The output I want is ['bob', 'bob', 'bob', 'bobby', 'bobert']
. If I could modify the comparison method of the bisect search, then I could use name.startswith('bob')
to do this.
As an example, in Java it would be easy. I would use:
Arrays.binarySearch(names, "bob", myCustomComparator);
where 'myCustomComparator' is a comparator that takes advantage of the startswith method (and some additional logic).
How do I do this in Python?
bisect
can be fooled into using a custom comparison by using an instance that uses the custom comparator of your chosing:
>>> class PrefixCompares(object):
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... return self.value < other[0:len(self.value)]
...
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> key = PrefixCompares('bob')
>>> leftIndex = bisect.bisect_left(names, key)
>>> rightIndex = bisect.bisect_right(names, key)
>>> print(names[leftIndex:rightIndex])
['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert']
>>>
DOH. the right bisect worked, but the left one obviously didn't. "adam" is not prefixed with "bob"!. to fix it, you have to adapt the sequence, too.
>>> class HasPrefix(object):
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... return self.value[0:len(other.value)] < other.value
...
>>> class Prefix(object):
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... return self.value < other.value[0:len(self.value)]
...
>>> class AdaptPrefix(object):
... def __init__(self, seq):
... self.seq = seq
... def __getitem__(self, key):
... return HasPrefix(self.seq[key])
... def __len__(self):
... return len(self.seq)
...
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> needle = Prefix('bob')
>>> haystack = AdaptPrefix(names)
>>> leftIndex = bisect.bisect_left(haystack, needle)
>>> rightIndex = bisect.bisect_right(haystack, needle)
>>> print(names[leftIndex:rightIndex])
['bob', 'bob', 'bob', 'bobby', 'bobert']
>>>
Unfortunately bisect
does not allow you to specify a key
function. What you can do though is add '\xff\xff\xff\xff'
to the string before using it to find the highest index, then take those elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With