Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange Python set and hash behaviour - how does this work?

Tags:

python

hash

set

I have a class called GraphEdge which I would like to be uniquely defined within a set (the built-in set type) by its tail and head members, which are set via __init__.

If I do not define __hash__, I see the following behaviour:

>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
139731804758160
>>> hash(H)
139731804760784
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('A', 'B')])

The set has no way to know that E and H are the same by my definition, since they have differing hashes (which is what the set type uses to determine uniqueness, to my knowledge), so it adds both as distinct elements. So I define a rather naive hash function for GraphEdge like so:

def __hash__( self ):
    return hash( self.tail ) ^ hash( self.head )

Now the above works as expected:

>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B')])

But clearly, ('A', 'B') and ('B', 'A') in this case will return the same hash, so I would expect that I could not add ('B', 'A') to a set already containing ('A', 'B'). But this is not what happens:

>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('B', 'A')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('B', 'A')])

So is the set type using the hash or not? If so, how is the last scenario possible? If not, why doesn't the first scenario (no __hash__ defined) work? Am I missing something?

Edit: For reference for future readers, I already had __eq__ defined (also based on tail and head).

like image 537
ezod Avatar asked Jan 29 '10 01:01

ezod


People also ask

Does Python set use hashing?

Sets and their working Set in Python can be defined as the collection of items. In Python, these are basically used to include membership testing and eliminating duplicate entries. The data structure used in this is Hashing, a popular technique to perform insertion, deletion and traversal in O(1) on average.

What is Python hash seed?

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.

Are hash tables ordered in Python?

As we know, Some of Python's data structures use hash tables for storing items like set or dictionary . So there is no order in these objects.


1 Answers

You have a hash collision. On hash collision, the set uses the == operator to check on whether or not they are truly equal to each other.

like image 143
Noctis Skytower Avatar answered Oct 07 '22 14:10

Noctis Skytower