Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when objects in a Set are altered to match each other?

Tags:

python

hash

set

As the title suggests I have a question regarding changing objects in Sets such that they become exactly the same (in the eyes of the set). Just curious.

I ask this question with regard to Python, but if it is generalizable feel free to do that.

If I have understood correctly in Python the Set iterable will determine whether objects are 'equal' by equating their hashes. So for objects a and b this would be:

hash(a) == hash(b)

For any object you make you can overwrite the standard hash function, __hash__, to your specific liking.

Suppose you create a hash function that takes several or all the objects in your object and uses the combination of hashes as its own (e.g. by ORing them).

Now if you have several initially different objects in a single Set, and consequently go over that Set and alter the objects within such that their inner objects match, what will happen to the Set? Will they all remain there, or will they get kicked out, or do we need to wait till an operation is performed on the Set? Or do we raise some error somewhere?

like image 550
Vincent Ketelaars Avatar asked Nov 13 '13 12:11

Vincent Ketelaars


1 Answers

Consider this test:

class A:
    def __init__(self, h):
        self.h = h

    def __hash__(self):
        return self.h

x = A(1)
y = A(2)

a = {x, y}

print x in a, y in a
print a

print "----"

x.h = 2

print x in a, y in a
print a

Result:

True True
set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])
----
False True
set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])

As you can see, the first object x is still there, but the in operator reports it isn't. Why did this happen?

From my understanding, Set objects are implemented using hash tables, and a hash table normally has a structure like this:

 hash_value => list of objects with this hash value
 another_hash_value => list of objects with this hash value

When a Set answers to in requests, it first calculates the hash value for the argument and then tries to locate it in the corresponding list. Our set a is initially like this:

  1 => [x]
  2 => [y]

Now, we change the x's hash and ask the set if the object is there. The set calculates the hash value (which is now 2) tries to locate x in the second list and fails - hence False.

To make things more interesting, let's do

a.add(x)
print x in a, y in a
print a

Result:

True True
set([<__main__.A instance at 0x107cbfd40>, 
     <__main__.A instance at 0x107cbfd88>, 
     <__main__.A instance at 0x107cbfd40>])

Now we have the same object twice in the set! As you can see, there's no automatic adjustment and no errors. Python is a grown-ups' language and always assumes you know what you're doing.

like image 173
georg Avatar answered Oct 09 '22 12:10

georg