I have two sets of custom objects that I build from the following tuples of dictionaries:
tupleOfDicts1 = ({'id': 1, 'name': 'peter', 'last': 'smith'},
{'id': 2, 'name': 'peter', 'last': 'smith'},
{'id': 3, 'name': 'mark', 'last':'white'},
{'id': 4, 'name': 'john', 'last': 'lennon'},)
tupleOfDicts2 = ({'id': 9, 'name': 'peter', 'last': 'smith'},
{'id': 8, 'name': 'peter', 'last': 'smith'},)
As you can see, I have elements that are the same, except by the 'id' property.
Then I am defining the following object:
class Result:
def __init__(self, **kwargs):
self.id = kwargs['id']
self.nome = kwargs['name']
self.cognome = kwargs['last']
def __repr__(self):
return 'Result(%s, %s, %s)' %(self.id, self.name, self.last)
def __hash__(self):
# hash must consider the id of the elements, but the id must not be considered when comparison
return hash((self.id, self.name, self.last))
def __eq__(self, other):
if isinstance(other, Result):
# I want comparison to be made considering only name and last
return (self.name, self.last) == (other.name, other.last)
else:
return False
def __ne__(self, other):
return not self.__eq__(other)
As you see this object is ready to receive the dictionaries in the constructor.
Now I define a function that returns a set of Result objects from the tuples containing the dictionaries:
def getSetFromTuple(tupleOfDicts):
myset = set()
for dictionary in tupleOfDicts:
myset.add(Result(**dictionary))
return myset
At this point I create my two sets:
mySet1 = getSetFromTuple(tupleOfDicts1)
mySet2 = getSetFromTuple(tupleOfDicts2)
I make all this because I want to have all elements on mySet1 that I do not have on mySet2 (for this comparison I do not want that the property 'id' gets involved):
diff = mySet1 - mySet2
But I am not getting what I want, in this case, I am getting all elements of mySet1:
print(len(mySet1 - mySet2)) # 4
I expect instead only two elements remaining from mySet1 because two of its elements are on mySet2 (with the same name and the same last the id will be always different).
It seems to me that when I call the - operator between two sets this class will compare the hash value of elements. In this case the output of 4 makes sense. BUT: Is there a way to do what I want?
Contrary to your comment, I think id should not be in the hash. If two elements are equal their hash must be equal as well:
def __hash__(self):
return hash((self.name, self.last))
Internally hash maps the value to a bucket. Elements with different hashes may end up in different buckets and avoid being compared completely when de-duplicated (sets)/queried (dictionaries).
That said, there is a much simpler way to get your results, without involving OOP and just working with the data itself:
dictionaries = tupleOfDicts1 + tupleOfDicts2
unique_values = {(d['name'], d['last']): d for d in dictionaries}.values()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With