Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a duplicate object being added to my Python set when it shouldn't be?

Tags:

python

set

I have a class definition of this:

class Question:
    title = ""
    answer = ""
    def __init__(self, title, answer):
            self.title = title
            self.answer = answer
    def __eq__(self, other):
            return self.title == other.title and self.answer == other.answer
    def __hash__(self):
        return hash(repr(self))

and I'm trying to add many of these objects to a set, only if the object does not have the same properties as any of the other objects already in the set:

        questionset = set()
        q = Question(questionparts[0] + questionparts[1], questionparts[2])
        if q not in questionset:
            questionset.add(q)

If I have two questions, each with the same property values, I expect only one to get added to my set, instead my set has a length of 2.

What am I doing wrong? If I log each question object, I can confirm the items have the same property values.

like image 711
marked-down Avatar asked Sep 21 '25 03:09

marked-down


2 Answers

Your hash function is grossly ineffective. Python requires that your __hash__ function should return the same value for two objects that are considered equal, but yours doesn't. From the object.__hash__ documentation:

The only required property is that objects which compare equal have the same hash value

repr(self) returns the default representation, which uses the object id. It'll basically return a different hash based on object identity. You may as well have done:

return hash(id(self))

That's not a good hash as that values differs between all instances. As a result your hash() values fail to meet the required property:

>>> a = Question('foo', 'bar')
>>> b = Question('foo', 'bar')
>>> a == b
True
>>> hash(a) == hash(b)
False

You need to hash your attributes instead:

return hash(self.title + self.answer)

Now the hash is based on the same values that inform equality.

like image 195
Martijn Pieters Avatar answered Sep 22 '25 18:09

Martijn Pieters


As Martijn Pieters and BrenBarn have mentioned, your __hash__() method doesn't work because of the behaviour of the default __repr__() method. Their answers show alternate implementations of __hash__(); the code below instead implements a simple but useful __repr__().

I'm not claiming that my approach is better than Martijn's or BrenBarn's approach. I'm simply showing that your original __hash__() method will work once your class has a suitable __repr__() method.

FWIW, it's generally a good idea to implement a __repr__() method for your classes. Even if you don't actually need the method in your final production code it can be helpful for testing purposes while developing the code.

I'm running this on Python 2.6.6, hence the from __future__ import and the explicit inheritance from object; you don't need to do those things in Python 3, but the code should still function properly on Python 3 as is.

#!/usr/bin/env python

from __future__ import print_function

class Question(object):
    def __init__(self, title, answer):
        self.title = title
        self.answer = answer

    def __repr__(self):
        return 'Q({s.title!r}, {s.answer!r})'.format(s=self)

    def __eq__(self, other):
        return self.title == other.title and self.answer == other.answer

    def __hash__(self):
        return hash(repr(self))

questionset = set()

q1 = Question('2+2', '4')
q2 = Question('2+2', '4')
q3 = Question('2*2', '4')

print(q1, q2, q3)
print(q1 == q2, q1 == q3)

questionset.update((q1, q2, q3))
print(questionset)

questionset.add(Question('1+3', '4'))
questionset.add(Question('2+3', '5'))
questionset.add(Question('2*3', '6'))
questionset.add(Question('1+3', '4'))
print(questionset)

output

Q('2+2', '4') Q('2+2', '4') Q('2*2', '4')
True False
set([Q('2+2', '4'), Q('2*2', '4')])
set([Q('2+3', '5'), Q('2+2', '4'), Q('1+3', '4'), Q('2*3', '6'), Q('2*2', '4')])

Note that there's no need to test if a Question instance is already a member of questionset. Set elements are unique, so it's impossible to add the same element multiple times.

like image 28
PM 2Ring Avatar answered Sep 22 '25 16:09

PM 2Ring