Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Persistent Hashing of Python Frozen Sets

Tags:

python

linux

How would you convert a nesting of Python frozenset objects into a unique integer that was the same across Python sessions and platforms?

e.g. I get different values from hash() on different platforms

32-bit

Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=frozenset([frozenset([1,2,3]),frozenset(['a','b','c'])]);
>>> hash(a)
1555175235

64-bit

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=frozenset([frozenset([1,2,3]),frozenset(['a','b','c'])]);
>>> hash(a)
-6076998737938213053
like image 642
Cerin Avatar asked Dec 26 '11 23:12

Cerin


1 Answers

How would you convert a nesting of Python frozenset objects into a unique integer that was the same across Python sessions and platforms?

AFAIK hashes are not guaranteed to be unique. In fact where they are used for lookup tables (like in dictionaries) hashes conflicts are quite common.

That said. If you want a consistent, non-unique "hash" across platform, I would try to use the standard library hashlib. I don't have the possibility to try it on different platform, but I believe that most of the algorithms implemented there (as for example the MD5 one) are platform-independent.

I would feed the hashing algorithms with the pickled version of the sorted set, so as to make sure the string used for hashing is always the same.


EDIT: Thought to add a basic example:

>>> import cPickle as pkl
>>> import hashlib as hl
>>> s = frozenset([1,2,3])
>>> p = pkl.dumps(sorted(s))  #make sure you use the same pickle protocol on all platform!
'(lp1\nI1\naI2\naI3\na.'
>>> h = hl.md5(p)
<md5 HASH object @ 0xb76fb110>
>>> h.digest()
"\x89\xaeG\x1d'\x83\xa5\xbd\xac\xa7\x1c\xd9\x1d/2t"  #this should be consistent
like image 61
mac Avatar answered Sep 27 '22 21:09

mac