Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - class __hash__ method and set [duplicate]

Tags:

I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.

But it behaves different as below:

class MyClass(object):

    def __hash__(self):
        return 0

result = set()
result.add(MyClass())
result.add(MyClass())

print(len(result)) # len = 2

While in case of string value, it works correctly.

result.add('aida')
result.add('aida')

print(len(result)) # len = 1

My question is: why the same hash objects are not same in set?

like image 534
Aida.Mirabadi Avatar asked Jul 18 '16 06:07

Aida.Mirabadi


1 Answers

Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.

This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.

To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:

class MyClass(object):
    def __hash__(self):
        return 0
    def __eq__(self, other):
        # another object is equal to self, iff 
        # it is an instance of MyClass
        return isinstance(other, MyClass)

Now:

>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1

In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:

class Person
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn

    def __eq__(self, other):
        return isinstance(other, Person) and self.ssn == other.ssn

    def __hash__(self):
        # use the hashcode of self.ssn since that is used
        # for equality checks as well
        return hash(self.ssn)

p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q})  # 1
like image 172