Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Objects are not considered the same in Dictionary keys - but __eq__ is implemented

The following code gives an error message:

    class Test(object):

    def __init__(self, test = 0):
        self.test = test

if __name__ == '__main__':
    t1 = Test(1)
    t2 = Test(2)
    t3 = Test(3)
    t4 = Test(1)
    my_dict = {}
    my_dict[t1] = 1
    my_dict[t2] = 2
    my_dict[t3] = 3

    print(my_dict[t4])

Traceback (most recent call last):
  File "C:\Users\Alexander\Documents\Visual Studio 2015\Projects\PipeProcessTest
\PipeProcessTest\DictionaryKeys.py", line 16, in <module>
    print(my_dict[t4])
KeyError: <__main__.Test object at 0x0000000002F18080>

This is due to the fact that python is treating t1 and t4 as different objects. However, when I implement the comparison operator 'eq', with the following code:

def __eq__(self, other):
        if self.test == other.test:
            return True
        else:
            return False

I get another error message, "unhashable type: 'Test'" telling me that now the dictionary cannot hash the Test object. How can I fix this problem, so that Python can recognize t1 and t4 as being the same, but also being able to hash the Test object?

like image 627
Alexander Whatley Avatar asked Aug 28 '15 20:08

Alexander Whatley


3 Answers

You need to implement __hash__ in addition to __eq__. See the documentation for notes on how to do this. The main thing to keep in mind is that objects that compare equal must have the same hash value. So if you want to compare equality only by looking at the test attribute, then your __hash__ also needs to use only the test attribute. Searching for information about __hash__ and __eq__ will also turn up many previous questions on this site about this.

like image 86
BrenBarn Avatar answered Nov 15 '22 04:11

BrenBarn


1. This is due to the fact that python is treating t1 and t4 as different objects. However, when I implement the comparison operator 'eq', with the following code:

When you do this below operation in python....

class Test(object):

    def __init__(self, test = 0):
        self.test = test

if __name__ == '__main__':
    t1 = Test(1)
    t2 = Test(2)
    t3 = Test(3)
    t4 = Test(1)
    my_dict = {}
    my_dict[t1] = 1
    my_dict[t2] = 2
    my_dict[t3] = 3

This means that you are actually trying to create a dict with keys as Test objects, Python does first checks whether the keys are hashable or not.

Any object in Python is hashable when it returns any integer value for obj.__hash__() method. In python all user defined classes by default gets some hash value that is id(self).

Obviously when you get id value as it's hash value, they are gonna look some thing like this value 8772302607193. SO with these id's if we construct the hash table it might looks like this..

Lets assume id's like this..

id(t1) = 1
id(t2) = 4   # These are just assumptions.
id(t3) = 10  # actual id's are long values.
id(t4) = 20

This is how hash table gets constructed....

    hash     Actual
    value    value  
    ------------------
    | 1    |  t1     |
    ------------------
    | 2    | NULL    |
    ------------------   # Consider that you've respective values are
    | 3    | NULL    |   # here, just for representation I've mentioned
    ------------------   # t1, t2, t3, t4, ..
    | 4    | t2      |
    ------------------
           |
           |
    ------------------
    | 10   |  t3     |
    ------------------
           |
         SO ON

Like this your hash table gets constructed, So here when you try for getting value of t4 just trying my_dict[t4]. First python check for hash value of t4 by calling t4.__hash__(), as per the assumption t4 hash value is 20.

After getting the hash value 20 it checks over the hash table with index as 20, since we didn't insert any value with 20 Python simply raises KeyError exception, this is the reason your were getting KeyError when you try my_dict[t4].

Another Scenario Here:

If you try overriding __hash__ method f Test class and proceed with the same operations you did like below..

class Test(object):

    def __init__(self, test = 0):
        self.test = test
    def __hash__(self):
        self.test       # You are just returning the same value

if __name__ == '__main__':
    t1 = Test(1)
    t2 = Test(2)
    t3 = Test(3)
    t4 = Test(1)
    my_dict = {}
    my_dict[t1] = 1
    my_dict[t2] = 2
    my_dict[t3] = 3

Since we've overloaded hash method to return the same value as initialized, below are the hash values we get

t1 = 1 , t2 = 2, t3 = 3, t4 = 1

This is how hash table looks like when we've multiple values with same hash value.

      hash    Actual
      value   value  
      ------------------
      | 1    | [t1,t4] | # List of values for the same hash value.
      ------------------
      | 2    |  t2     |
      ------------------ # Consider that you've respective values are
      | 3    |  t3     | # here, just for representation I've mentioned
      ------------------ # t1, t2, ...
      | 4    |  NULL   |
      ------------------
           |
         SO ON

In this situation when you try to get my_dict[t4], as said before first checks for the hash value t4.__hash__() returns 1. Now Python dict check at index 1 in the hash table and it gets multiple values [t1, t4].

And this is the situation where __eq__ helps you to identify the object when you've multiple values with same hash value. You can do like below to avoid that situation...

class Test(object):
    def __init__(self, test = 0):
        self.test = test
    def __hash__(self):
        return self.test
    def __eq__(self, other):
        return self is other

In your case you just need to verify the self.test value to get the object...

class Test(object):

    def __init__(self, test = 0):
        self.test = test
    def __hash__(self):
        return self.test
    def __eq__(self, other):
        return other.test == self.test

This is how you can manage your dict values!

like image 42
gsb-eng Avatar answered Nov 15 '22 06:11

gsb-eng


You just need to return self.test in the __hash__ method so your objects hash value is based on their test attribute so t1 and t4 will have the same hash value and a dict lookup on t1 or t4 will return the same value:

   def __init__(self, test = 0):
        self.test = test
    def __eq__(self, other):
        return self.test == other.test
    def __hash__(self):
        return self.test

The if/else is not needed in your eq, you can simply return the outcome of self.test == other.test

In [2]: t1 = Test(1)   
In [3]: t2 = Test(2)   
In [4]: t3 = Test(3)   
In [5]: t4 = Test(1)
In [6]:  my_dict = {}    
In [7]: print(t1 == t4)
True   
In [8]: my_dict[t1] = 1    
In [9]: my_dict[t2] = 2    
In [10]: my_dict[t3] = 3
In [11]: print(my_dict[t4])
1
like image 28
Padraic Cunningham Avatar answered Nov 15 '22 06:11

Padraic Cunningham