Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary - binary search for a key?

Tags:

python

I want to write a container class that acts like a dictionary (actually derives from a dict), The keys for this structure will be dates.

When a key (i.e. date) is used to retrieve a value from the class, if the date does not exist then the next available date that preceeds the key is used to return the value.

The following data should help explain the concept further:

Date (key)      Value
2001/01/01      123
2001/01/02       42
2001/01/03      100
2001/01/04      314
2001/01/07      312
2001/01/09      321

If I try to fetch the value associated with key (date) '2001/01/05' I should get the value stored under the key 2001/01/04 since that key occurs before where the key '2001/01/05' would be if it existed in the dictionary.

In order to do this, I need to be able to do a search (ideally binary, rather than naively looping through every key in the dictionary). I have searched for bsearch dictionary key lookups in Python dictionaries - but have not found anything useful.

Anyway, I want to write a class like that encapsulates this behavior.

This is what I have so far (not much):

#
class NearestNeighborDict(dict):
#
"""
#
a dictionary which returns value of nearest neighbor 
if specified key not found
#
"""

def __init__(self, items={}):
    dict.__init__(self, items)


def get_item(self, key):
    # returns the item stored with the key (if key exists)
    # else it returns the item stored with the key
like image 747
morpheous Avatar asked Jul 02 '10 02:07

morpheous


People also ask

How do I search for a key in Python?

To simply check if a key exists in a Python dictionary you can use the in operator to search through the dictionary keys like this: pets = {'cats': 1, 'dogs': 2, 'fish': 3} if 'dogs' in pets: print('Dogs found!') # Dogs found! A dictionary can be a convenient data structure for counting the occurrence of items.

How do I check a dictionary key?

Check If Key Exists using has_key() method Using has_key() method returns true if a given key is available in the dictionary, otherwise, it returns a false. With the Inbuilt method has_key(), use the if statement to check if the key is present in the dictionary or not.

Can Python dictionary string a key?

The dictionary webstersDict used strings as keys in the dictionary, but dictionary keys can be any immutable data type (numbers, strings, tuples etc). Dictionary values can be just about anything (int, lists, functions, strings, etc).


2 Answers

The sortedcontainers module provides a SortedDict type that maintains the keys in sorted order and supports bisecting on those keys. The module is pure-Python and fast-as-C implementations with 100% test coverage and hours of stress.

For example:

from sortedcontainers import SortedDict

sd = SortedDict((date, value) for date, value in data)

# Bisect for the index of the desired key.
index = sd.bisect('2001/01/05')

# Lookup the real key at that index.
key = sd.iloc[index]

# Retrieve the value associated with that key.
value = sd[key]

Because SortedDict supports fast indexing, it's easy to look ahead or behind your key as well. SortedDict is also a MutableMapping so it should work nicely in your type system.

like image 189
GrantJ Avatar answered Oct 12 '22 15:10

GrantJ


You really don't want to subclass dict because you can't really reuse any of its functionality. Rather, subclass the abstract base class collections.Mapping (or MutableMapping if you want to also be able to modify an instance after creation), implement the indispensable special methods for the purpose, and you'll get other dict-like methods "for free" from the ABC.

The methods you need to code are __getitem__ (and __setitem__ and __delitem__ if you want mutability), __len__, __iter__, and __contains__.

The bisect module of the standard library gives you all you need to implement these efficiently on top of a sorted list. For example...:

import collections
import bisect

class MyDict(collections.Mapping):
  def __init__(self, contents):
    "contents must be a sequence of key/value pairs"
    self._list = sorted(contents)
  def __iter__(self):
    return (k for (k, _) in self._list)
  def __contains__(self, k):
    i = bisect.bisect_left(self._list, (k, None))
    return i < len(self._list) and self._list[i][0] == k
  def __len__(self):
    return len(self._list)
  def __getitem__(self, k):
    i = bisect.bisect_left(self._list, (k, None))
    if i >= len(self._list): raise KeyError(k)
    return self._list[i][1]

You'll probably want to fiddle __getitem__ depending on what you want to return (or whether you want to raise) for various corner cases such as "k greater than all keys in self".

like image 25
Alex Martelli Avatar answered Oct 12 '22 13:10

Alex Martelli