Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python hashable dicts

Tags:

python

People also ask

Are Dicts hashable Python?

All immutable built-in objects in Python are hashable like tuples while the mutable containers like lists and dictionaries are not hashable. Objects which are instances of the user-defined class are hashable by default, they all compare unequal, and their hash value is their id().

What does it mean to be hashable in Python?

An object is said to be hashable if it has a hash value that remains the same during its lifetime. It has a __hash__() method and it can be compared to other objects. For this, it needs the __eq__() or __cmp__()method. If hashable objects are equal when compared, then they have same hash value.

How do I make a list hashable in Python?

Just use a tuple as a key. Tuples are immutable and hashable, so they're useful as dictionary keys. list_of_ints = [1, 20, 3, 4] # tuple(list_of_ints) == (1, 20, 3, 4) some_dict = {tuple(list_of_ints): "some value", ...}

Which datatypes are hashable in Python?

Hashable data types: int , float , str , tuple , and NoneType . Unhashable data types: dict , list , and set .


Here is the easy way to make a hashable dictionary. Just remember not to mutate them after embedding in another dictionary for obvious reasons.

class hashabledict(dict):
    def __hash__(self):
        return hash(tuple(sorted(self.items())))

Hashables should be immutable -- not enforcing this but TRUSTING you not to mutate a dict after its first use as a key, the following approach would work:

class hashabledict(dict):
  def __key(self):
    return tuple((k,self[k]) for k in sorted(self))
  def __hash__(self):
    return hash(self.__key())
  def __eq__(self, other):
    return self.__key() == other.__key()

If you DO need to mutate your dicts and STILL want to use them as keys, complexity explodes hundredfolds -- not to say it can't be done, but I'll wait until a VERY specific indication before I get into THAT incredible morass!-)


All that is needed to make dictionaries usable for your purpose is to add a __hash__ method:

class Hashabledict(dict):
    def __hash__(self):
        return hash(frozenset(self))

Note, the frozenset conversion will work for all dictionaries (i.e. it doesn't require the keys to be sortable). Likewise, there is no restriction on the dictionary values.

If there are many dictionaries with identical keys but with distinct values, it is necessary to have the hash take the values into account. The fastest way to do that is:

class Hashabledict(dict):
    def __hash__(self):
        return hash((frozenset(self), frozenset(self.itervalues())))

This is quicker than frozenset(self.iteritems()) for two reasons. First, the frozenset(self) step reuses the hash values stored in the dictionary, saving unnecessary calls to hash(key). Second, using itervalues will access the values directly and avoid the many memory allocator calls using by items to form new many key/value tuples in memory every time you do a lookup.


The given answers are okay, but they could be improved by using frozenset(...) instead of tuple(sorted(...)) to generate the hash:

>>> import timeit
>>> timeit.timeit('hash(tuple(sorted(d.iteritems())))', "d = dict(a=3, b='4', c=2345, asdfsdkjfew=0.23424, x='sadfsadfadfsaf')")
4.7758948802947998
>>> timeit.timeit('hash(frozenset(d.iteritems()))', "d = dict(a=3, b='4', c=2345, asdfsdkjfew=0.23424, x='sadfsadfadfsaf')")
1.8153600692749023

The performance advantage depends on the content of the dictionary, but in most cases I've tested, hashing with frozenset is at least 2 times faster (mainly because it does not need to sort).


A reasonably clean, straightforward implementation is

import collections

class FrozenDict(collections.Mapping):
    """Don't forget the docstrings!!"""

    def __init__(self, *args, **kwargs):
        self._d = dict(*args, **kwargs)

    def __iter__(self):
        return iter(self._d)

    def __len__(self):
        return len(self._d)

    def __getitem__(self, key):
        return self._d[key]

    def __hash__(self):
        return hash(tuple(sorted(self._d.iteritems())))

I keep coming back to this topic... Here's another variation. I'm uneasy with subclassing dict to add a __hash__ method; There's virtually no escape from the problem that dict's are mutable, and trusting that they won't change seems like a weak idea. So I've instead looked at building a mapping based on a builtin type that is itself immutable. although tuple is an obvious choice, accessing values in it implies a sort and a bisect; not a problem, but it doesn't seem to be leveraging much of the power of the type it's built on.

What if you jam key, value pairs into a frozenset? What would that require, how would it work?

Part 1, you need a way of encoding the 'item's in such a way that a frozenset will treat them mainly by their keys; I'll make a little subclass for that.

import collections
class pair(collections.namedtuple('pair_base', 'key value')):
    def __hash__(self):
        return hash((self.key, None))
    def __eq__(self, other):
        if type(self) != type(other):
            return NotImplemented
        return self.key == other.key
    def __repr__(self):
        return repr((self.key, self.value))

That alone puts you in spitting distance of an immutable mapping:

>>> frozenset(pair(k, v) for k, v in enumerate('abcd'))
frozenset([(0, 'a'), (2, 'c'), (1, 'b'), (3, 'd')])
>>> pairs = frozenset(pair(k, v) for k, v in enumerate('abcd'))
>>> pair(2, None) in pairs
True
>>> pair(5, None) in pairs
False
>>> goal = frozenset((pair(2, None),))
>>> pairs & goal
frozenset([(2, None)])

D'oh! Unfortunately, when you use the set operators and the elements are equal but not the same object; which one ends up in the return value is undefined, we'll have to go through some more gyrations.

>>> pairs - (pairs - goal)
frozenset([(2, 'c')])
>>> iter(pairs - (pairs - goal)).next().value
'c'

However, looking values up in this way is cumbersome, and worse, creates lots of intermediate sets; that won't do! We'll create a 'fake' key-value pair to get around it:

class Thief(object):
    def __init__(self, key):
        self.key = key
    def __hash__(self):
        return hash(pair(self.key, None))
    def __eq__(self, other):
        self.value = other.value
        return pair(self.key, None) == other

Which results in the less problematic:

>>> thief = Thief(2)
>>> thief in pairs
True
>>> thief.value
'c'

That's all the deep magic; the rest is wrapping it all up into something that has an interface like a dict. Since we're subclassing from frozenset, which has a very different interface, there's quite a lot of methods; we get a little help from collections.Mapping, but most of the work is overriding the frozenset methods for versions that work like dicts, instead:

class FrozenDict(frozenset, collections.Mapping):
    def __new__(cls, seq=()):
        return frozenset.__new__(cls, (pair(k, v) for k, v in seq))
    def __getitem__(self, key):
        thief = Thief(key)
        if frozenset.__contains__(self, thief):
            return thief.value
        raise KeyError(key)
    def __eq__(self, other):
        if not isinstance(other, FrozenDict):
            return dict(self.iteritems()) == other
        if len(self) != len(other):
            return False
        for key, value in self.iteritems():
            try:
                if value != other[key]:
                    return False
            except KeyError:
                return False
        return True
    def __hash__(self):
        return hash(frozenset(self.iteritems()))
    def get(self, key, default=None):
        thief = Thief(key)
        if frozenset.__contains__(self, thief):
            return thief.value
        return default
    def __iter__(self):
        for item in frozenset.__iter__(self):
            yield item.key
    def iteritems(self):
        for item in frozenset.__iter__(self):
            yield (item.key, item.value)
    def iterkeys(self):
        for item in frozenset.__iter__(self):
            yield item.key
    def itervalues(self):
        for item in frozenset.__iter__(self):
            yield item.value
    def __contains__(self, key):
        return frozenset.__contains__(self, pair(key, None))
    has_key = __contains__
    def __repr__(self):
        return type(self).__name__ + (', '.join(repr(item) for item in self.iteritems())).join('()')
    @classmethod
    def fromkeys(cls, keys, value=None):
        return cls((key, value) for key in keys)

which, ultimately, does answer my own question:

>>> myDict = {}
>>> myDict[FrozenDict(enumerate('ab'))] = 5
>>> FrozenDict(enumerate('ab')) in myDict
True
>>> FrozenDict(enumerate('bc')) in myDict
False
>>> FrozenDict(enumerate('ab', 3)) in myDict
False
>>> myDict[FrozenDict(enumerate('ab'))]
5