Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, find item in list of dicts, using bisect

I have a list of dicts, something like this:

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

The dict items are sorted in the list according to the 'offset' data. The real data could be much longer.

What I want to do is lookup an item in the list given a particular offset value, which is not exactly one of those values, but in that range. So, a binary search is what I want to do.

I am now aware of the Python bisect module, which is a ready-made binary search—great, but not directly usable for this case. I'm just wondering what is the easiest way to adapt bisect to my needs. Here is what I came up with:

import bisect

class dict_list_index_get_member(object):
    def __init__(self, dict_list, member):
        self.dict_list = dict_list
        self.member = member
    def __getitem__(self, index):
        return self.dict_list[index][self.member]
    def __len__(self):
        return self.dict_list.__len__()

test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)

It prints:

2

My question is, is this the best way to do what I want, or is there some other simpler, better way?

like image 854
Craig McQueen Avatar asked Aug 27 '09 23:08

Craig McQueen


1 Answers

You could also use one of Python's many SortedDict implementations to manage your test_data. A sorted dict sorts the elements by key and maintains a mapping to a value. Some implementations also support a bisect operation on the keys. For example, the Python sortedcontainers module has a SortedDict that meets your requirements.

In your case it would look something like:

from sortedcontainers import SortedDict
offset_map = SortedDict((item['offset'], item['data']) for item in test_data)
index = offset_map.bisect(1275)
key = offset_map.iloc[index]
print offset_map[key]
# 120

The SortedDict type has a bisect function which returns the bisected index of the desired key. With that index, you can lookup the actual key. And with that key you can get the value.

All of these operations are very fast in sortedcontainers which is also conveniently implemented in pure-Python. There's a performance comparison too which discusses other choices and has benchmark data.

like image 85
GrantJ Avatar answered Oct 03 '22 14:10

GrantJ