I have a list of dicts, something like this:
test_data = [
{ 'offset':0, 'data':1500 },
{ 'offset':1270, 'data':120 },
{ 'offset':2117, 'data':30 },
{ 'offset':4055, 'data':30000 },
]
The dict items are sorted in the list according to the 'offset'
data. The real data could be much longer.
What I want to do is lookup an item in the list given a particular offset value, which is not exactly one of those values, but in that range. So, a binary search is what I want to do.
I am now aware of the Python bisect
module, which is a ready-made binary search—great, but not directly usable for this case. I'm just wondering what is the easiest way to adapt bisect
to my needs. Here is what I came up with:
import bisect
class dict_list_index_get_member(object):
def __init__(self, dict_list, member):
self.dict_list = dict_list
self.member = member
def __getitem__(self, index):
return self.dict_list[index][self.member]
def __len__(self):
return self.dict_list.__len__()
test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)
It prints:
2
My question is, is this the best way to do what I want, or is there some other simpler, better way?
You could also use one of Python's many SortedDict implementations to manage your test_data. A sorted dict sorts the elements by key and maintains a mapping to a value. Some implementations also support a bisect operation on the keys. For example, the Python sortedcontainers module has a SortedDict that meets your requirements.
In your case it would look something like:
from sortedcontainers import SortedDict
offset_map = SortedDict((item['offset'], item['data']) for item in test_data)
index = offset_map.bisect(1275)
key = offset_map.iloc[index]
print offset_map[key]
# 120
The SortedDict type has a bisect function which returns the bisected index of the desired key. With that index, you can lookup the actual key. And with that key you can get the value.
All of these operations are very fast in sortedcontainers which is also conveniently implemented in pure-Python. There's a performance comparison too which discusses other choices and has benchmark data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With