Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does __eq__ gets called using hash()?

As mentioned here,

Below code,

class Person(object):
     def __init__(self, name, ssn, address):
         self.name = name
         self.ssn = ssn
         self.address = address
     def __hash__(self):
         print('in hash')
         return hash(self.ssn)
     def __eq__(self, other):
         print('in eq')
         return self.ssn == other.ssn

bob = Person('bob', '1111-222-333', None)

jim = Person('jim bo', '1111-222-333', 'sf bay area')


dmv_appointments = {}
print('calling hash')
dmv_appointments[bob] = 'tomorrow'
print('calling hash')
print(dmv_appointments[jim])
print('calling hash again')
print(dmv_appointments[bob])

Output:

calling hash
in hash
calling hash
in hash
in eq
tomorrow
calling hash again
in hash
tomorrow

Question:

Why __eq__ gets called on accessing jimbut not on bob?

like image 931
overexchange Avatar asked Jun 13 '17 20:06

overexchange


2 Answers

Short answer: a dictionary lookup first does a (cheap) reference equality check (x is y) when searching a bucket, and only if that fails, a (more expensive) equality check (x == y) is done.

Scenario

The __hash__ function does not call __eq__ internally. Given you construct bob and jim, no such methods are called.

Next you associate bob with 'tomorrow'. In order to know in which bucket of the dictionary, you have to store bob, you calculate the hash. Now once you have done that we store bob (and the value in the correct bucket).

Next we want to obtain jim. In order to know in which bucket jim resides, we calculate the hash. Next we start searching in the bucket. The bucket will contain bob. We first perform a reference check (jim is bob) but that fails, so then we fallback on the equality check. That check succeeds, so we return the value corresponding with bob: 'tomorrow'.

The same scenario happens when we want to look for bob: we calculate the hash, fetch the bucket. Perform a reference check on bob is bob, and that one succeeds. So we do not need a (probably more expensive equality check). We simply return the value 'tomorrow'.

Reference checks

The fact that a reference check is done first can be proven with the following (unhealthy) code:

class Person(object):
     def __init__(self, name, ssn, address):
         self.name = name
         self.ssn = ssn
         self.address = address
     def __hash__(self):
         print('in hash')
         return hash(self.ssn)
     def __eq__(self, other):
         print('in eq')
         return False

Here we return always False for equality. So even:

>>> bob == bob
in eq
False
>>> bob is bob
True

bob is not equal to itself (this is actually not good design, since for a dictionary, it is a contract that an object is equal to itself: a good equality relation is reflexive, symmetrical and transitive). Nevertheless, if we associate bob with 'tomorrow', we are still able to fetch the value associated with bob:

>>> dmv_appointments = {}
>>> dmv_appointments[bob] = 'tomorrow'
in hash
>>> dmv_appointments[bob]
in hash
'tomorrow'
like image 78
Willem Van Onsem Avatar answered Nov 14 '22 18:11

Willem Van Onsem


To answer the title:

When does __eq__ gets called using hash()?

Never.


The other question:

Why __eq__ gets called on accessing jim but not on bob?

That's more complicated. To understand that you need to know how a dictionary is implemented. Assuming CPython it will be a table containing a hash column, a key column and a value column:

hash     | key       | value
-----------------------------------------
    -    |      -    |        -
-----------------------------------------
    -    |      -    |        -

It will have a certain size but it won't be big enough to contain every possible hash value, so it will calculate the position based on the hash. For example if you add bob it could have (string hashes are randomized in certain CPython versions so the actual result will differ) a hash of 7475314405837642385. Assuming the dictionary has an actual size of 2 (in reality it will be bigger, but that would unnecessarily waste space in the answer) it just takes the modulo, so it will place it in 7475314405837642385 % 2 == 1:

hash     | key       | value
-----------------------------------------
    -    |      -    |        -
-----------------------------------------
747...385|    bob    |    'tomorrow'

When you want to look up a key it

  • always calculate the hash of the lookup
  • then it will calculate the position.
  • Then it compares the hash of the lookup to the hash saved in that position
  • if the hashes are equal then it will compare the lookup and the saved key with PyObject_RichCompareBool. That will:
    • first check for identity: lookup is key
    • if that is False it will check lookup == key

So in case you lookup bob:

  • hash: 7475314405837642385
  • position: 7475314405837642385 % 2 -> 1
  • found an entry, so compare the hashes: 7475314405837642385 == 7475314405837642385
  • that was equal so check for identity: bob is bob -> True

So it returns 'tomorrow' without an equality check. In the second case it checks for jim:

  • hash: 7475314405837642385
  • position: 7475314405837642385 % 2 -> 1
  • found an entry, so compare the hashes: 7475314405837642385 == 7475314405837642385
  • that was equal so check for identity: jim is bob -> False
  • check for equality jim == bob -> True

So it returns 'tomorrow'.


This is just an approximation of the actual implementation (it's missing some details). It gets more complicated if the hashes are not equal or the lookup is not key and lookup != key but these are not really important to understand the observed behavior you questioned.

However, I really need to say this: What you're doing is really dangerous because your class isn't immutable. You could accidentally make the saved dictionary entry unavailable to you:

dmv_appointments = {bob: 1}
bob.ssn = '1'     # changing ssn changes the hash!
dmv_appointments[bob]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-35-3920ada7bab1> in <module>()
     15 dmv_appointments = {bob: 1}
     16 bob.ssn = '1'
---> 17 dmv_appointments[bob]

KeyError: <__main__.Person object at 0x000001BD5DDCC470>

(It could still work in case the new hash is equal to the "old" hash, but that would be rather accidental).

That's because while you alter the hash of your instance - the dictionary won't update the saved hash because it assumes all keys are immutable! So the dictionary would either assume it would be saved in another position or if the position would (miraculously) be the same then it would fail in the step where it contains the actual hashes.

like image 37
MSeifert Avatar answered Nov 14 '22 17:11

MSeifert