As the title suggests I have a question regarding changing objects in Sets such that they become exactly the same (in the eyes of the set). Just curious.
I ask this question with regard to Python, but if it is generalizable feel free to do that.
If I have understood correctly in Python the Set iterable will determine whether objects are 'equal' by equating their hashes. So for objects a and b this would be:
hash(a) == hash(b)
For any object you make you can overwrite the standard hash function, __hash__
, to your specific liking.
Suppose you create a hash function that takes several or all the objects in your object and uses the combination of hashes as its own (e.g. by ORing them).
Now if you have several initially different objects in a single Set, and consequently go over that Set and alter the objects within such that their inner objects match, what will happen to the Set? Will they all remain there, or will they get kicked out, or do we need to wait till an operation is performed on the Set? Or do we raise some error somewhere?
Consider this test:
class A:
def __init__(self, h):
self.h = h
def __hash__(self):
return self.h
x = A(1)
y = A(2)
a = {x, y}
print x in a, y in a
print a
print "----"
x.h = 2
print x in a, y in a
print a
Result:
True True
set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])
----
False True
set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])
As you can see, the first object x
is still there, but the in
operator reports it isn't. Why did this happen?
From my understanding, Set objects are implemented using hash tables, and a hash table normally has a structure like this:
hash_value => list of objects with this hash value
another_hash_value => list of objects with this hash value
When a Set answers to in
requests, it first calculates the hash value for the argument and then tries to locate it in the corresponding list. Our set a
is initially like this:
1 => [x]
2 => [y]
Now, we change the x
's hash and ask the set if the object is there. The set calculates the hash value (which is now 2
) tries to locate x
in the second list and fails - hence False
.
To make things more interesting, let's do
a.add(x)
print x in a, y in a
print a
Result:
True True
set([<__main__.A instance at 0x107cbfd40>,
<__main__.A instance at 0x107cbfd88>,
<__main__.A instance at 0x107cbfd40>])
Now we have the same object twice in the set! As you can see, there's no automatic adjustment and no errors. Python is a grown-ups' language and always assumes you know what you're doing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With