As mentioned here,
Below code,
class Person(object):
def __init__(self, name, ssn, address):
self.name = name
self.ssn = ssn
self.address = address
def __hash__(self):
print('in hash')
return hash(self.ssn)
def __eq__(self, other):
print('in eq')
return self.ssn == other.ssn
bob = Person('bob', '1111-222-333', None)
jim = Person('jim bo', '1111-222-333', 'sf bay area')
dmv_appointments = {}
print('calling hash')
dmv_appointments[bob] = 'tomorrow'
print('calling hash')
print(dmv_appointments[jim])
print('calling hash again')
print(dmv_appointments[bob])
Output:
calling hash
in hash
calling hash
in hash
in eq
tomorrow
calling hash again
in hash
tomorrow
Question:
Why __eq__
gets called on accessing jim
but not on bob
?
Short answer: a dictionary lookup first does a (cheap) reference equality check (x is y
) when searching a bucket, and only if that fails, a (more expensive) equality check (x == y
) is done.
The __hash__
function does not call __eq__
internally. Given you construct bob
and jim
, no such methods are called.
Next you associate bob
with 'tomorrow'
. In order to know in which bucket of the dictionary, you have to store bob
, you calculate the hash. Now once you have done that we store bob
(and the value in the correct bucket).
Next we want to obtain jim
. In order to know in which bucket jim
resides, we calculate the hash. Next we start searching in the bucket. The bucket will contain bob
. We first perform a reference check (jim is bob
) but that fails, so then we fallback on the equality check. That check succeeds, so we return the value corresponding with bob
: 'tomorrow'
.
The same scenario happens when we want to look for bob
: we calculate the hash, fetch the bucket. Perform a reference check on bob is bob
, and that one succeeds. So we do not need a (probably more expensive equality check). We simply return the value 'tomorrow'
.
The fact that a reference check is done first can be proven with the following (unhealthy) code:
class Person(object):
def __init__(self, name, ssn, address):
self.name = name
self.ssn = ssn
self.address = address
def __hash__(self):
print('in hash')
return hash(self.ssn)
def __eq__(self, other):
print('in eq')
return False
Here we return always False
for equality. So even:
>>> bob == bob
in eq
False
>>> bob is bob
True
bob
is not equal to itself (this is actually not good design, since for a dictionary, it is a contract that an object is equal to itself: a good equality relation is reflexive, symmetrical and transitive). Nevertheless, if we associate bob
with 'tomorrow'
, we are still able to fetch the value associated with bob
:
>>> dmv_appointments = {}
>>> dmv_appointments[bob] = 'tomorrow'
in hash
>>> dmv_appointments[bob]
in hash
'tomorrow'
To answer the title:
When does
__eq__
gets called using hash()?
Never.
The other question:
Why
__eq__
gets called on accessing jim but not on bob?
That's more complicated. To understand that you need to know how a dictionary is implemented. Assuming CPython it will be a table containing a hash
column, a key
column and a value
column:
hash | key | value
-----------------------------------------
- | - | -
-----------------------------------------
- | - | -
It will have a certain size but it won't be big enough to contain every possible hash
value, so it will calculate the position based on the hash
. For example if you add bob
it could have (string hashes are randomized in certain CPython versions so the actual result will differ) a hash
of 7475314405837642385
. Assuming the dictionary has an actual size of 2 (in reality it will be bigger, but that would unnecessarily waste space in the answer) it just takes the modulo, so it will place it in 7475314405837642385 % 2 == 1
:
hash | key | value
-----------------------------------------
- | - | -
-----------------------------------------
747...385| bob | 'tomorrow'
When you want to look up a key it
hash
of the lookuphash
of the lookup to the hash
saved in that positionhash
es are equal then it will compare the lookup and the saved key
with PyObject_RichCompareBool
. That will:
lookup is key
False
it will check lookup == key
So in case you lookup bob
:
7475314405837642385
7475314405837642385 % 2
-> 1
hash
es: 7475314405837642385 == 7475314405837642385
bob is bob
-> True
So it returns 'tomorrow'
without an equality check. In the second case it checks for jim
:
7475314405837642385
7475314405837642385 % 2
-> 1
hash
es: 7475314405837642385 == 7475314405837642385
jim is bob
-> False
jim == bob
-> True
So it returns 'tomorrow'
.
This is just an approximation of the actual implementation (it's missing some details). It gets more complicated if the hash
es are not equal or the lookup is not key and lookup != key
but these are not really important to understand the observed behavior you questioned.
However, I really need to say this: What you're doing is really dangerous because your class isn't immutable. You could accidentally make the saved dictionary entry unavailable to you:
dmv_appointments = {bob: 1}
bob.ssn = '1' # changing ssn changes the hash!
dmv_appointments[bob]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-35-3920ada7bab1> in <module>()
15 dmv_appointments = {bob: 1}
16 bob.ssn = '1'
---> 17 dmv_appointments[bob]
KeyError: <__main__.Person object at 0x000001BD5DDCC470>
(It could still work in case the new hash
is equal to the "old" hash, but that would be rather accidental).
That's because while you alter the hash
of your instance - the dictionary won't update the saved hash
because it assumes all keys are immutable! So the dictionary would either assume it would be saved in another position or if the position would (miraculously) be the same then it would fail in the step where it contains the actual hashes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With