I have a class definition of this:
class Question:
title = ""
answer = ""
def __init__(self, title, answer):
self.title = title
self.answer = answer
def __eq__(self, other):
return self.title == other.title and self.answer == other.answer
def __hash__(self):
return hash(repr(self))
and I'm trying to add many of these objects to a set, only if the object does not have the same properties as any of the other objects already in the set:
questionset = set()
q = Question(questionparts[0] + questionparts[1], questionparts[2])
if q not in questionset:
questionset.add(q)
If I have two questions, each with the same property values, I expect only one to get added to my set, instead my set has a length of 2.
What am I doing wrong? If I log each question object, I can confirm the items have the same property values.
Your hash function is grossly ineffective. Python requires that your __hash__
function should return the same value for two objects that are considered equal, but yours doesn't. From the object.__hash__
documentation:
The only required property is that objects which compare equal have the same hash value
repr(self)
returns the default representation, which uses the object id. It'll basically return a different hash based on object identity. You may as well have done:
return hash(id(self))
That's not a good hash as that values differs between all instances. As a result your hash()
values fail to meet the required property:
>>> a = Question('foo', 'bar')
>>> b = Question('foo', 'bar')
>>> a == b
True
>>> hash(a) == hash(b)
False
You need to hash your attributes instead:
return hash(self.title + self.answer)
Now the hash is based on the same values that inform equality.
As Martijn Pieters and BrenBarn have mentioned, your __hash__()
method doesn't work because of the behaviour of the default __repr__()
method. Their answers show alternate implementations of __hash__()
; the code below instead implements a simple but useful __repr__()
.
I'm not claiming that my approach is better than Martijn's or BrenBarn's approach. I'm simply showing that your original __hash__()
method will work once your class has a suitable __repr__()
method.
FWIW, it's generally a good idea to implement a __repr__()
method for your classes. Even if you don't actually need the method in your final production code it can be helpful for testing purposes while developing the code.
I'm running this on Python 2.6.6, hence the from __future__ import
and the explicit inheritance from object
; you don't need to do those things in Python 3, but the code should still function properly on Python 3 as is.
#!/usr/bin/env python
from __future__ import print_function
class Question(object):
def __init__(self, title, answer):
self.title = title
self.answer = answer
def __repr__(self):
return 'Q({s.title!r}, {s.answer!r})'.format(s=self)
def __eq__(self, other):
return self.title == other.title and self.answer == other.answer
def __hash__(self):
return hash(repr(self))
questionset = set()
q1 = Question('2+2', '4')
q2 = Question('2+2', '4')
q3 = Question('2*2', '4')
print(q1, q2, q3)
print(q1 == q2, q1 == q3)
questionset.update((q1, q2, q3))
print(questionset)
questionset.add(Question('1+3', '4'))
questionset.add(Question('2+3', '5'))
questionset.add(Question('2*3', '6'))
questionset.add(Question('1+3', '4'))
print(questionset)
output
Q('2+2', '4') Q('2+2', '4') Q('2*2', '4')
True False
set([Q('2+2', '4'), Q('2*2', '4')])
set([Q('2+3', '5'), Q('2+2', '4'), Q('1+3', '4'), Q('2*3', '6'), Q('2*2', '4')])
Note that there's no need to test if a Question
instance is already a member of questionset
. Set elements are unique, so it's impossible to add the same element multiple times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With