Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python set() and __hash__ confusion

Tags:

python

hash

set

One of the classes I defined is used in a set() to filter out equal objects. But it doesn't work as I expect it, so I obviously understand something wrong.

class Foo(object):

    def __hash__(self):
        return 7

x = set()
x.add(Foo())
assert len(x) == 1
x.add(Foo())
assert len(x) == 1    # AssertionError

I expect the set to consist of only one element, but it has two. Why is that?

like image 331
Niklas R Avatar asked Jun 28 '26 02:06

Niklas R


1 Answers

Hash collisions are know to occur in sets (hash maps), no hash algorithm is good enough to have a unique hash for every item, or else it would take a long time to calculate. When a collision does occur, python falls back to checking the equality of the values with __eq__ to make sure they are not the same.

class Foo(object):

    def __hash__(self):
        return 7
    def __eq__(self, other):
        return True

>>> x = set()
>>> x.add(Foo())
>>> assert len(x) == 1
>>> x.add(Foo())
>>> assert len(x) == 1
>>> 

This is a reason why you see frightening runtimes over here but note that you can expect O(1) amortized membership checks in sets, even though they have O(N) worst case (everything is a hash collision). The worst case is very very very unlikely to occur due to Python's smart implementation.

like image 57
jamylak Avatar answered Jun 29 '26 16:06

jamylak