Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is python's hash() portable?

Is python's hash function portable?

By "portable" I mean, will it return the same results (for the same data) across python versions, platforms and implementations?

If not, is there any alternative to it that provides such features (while still capable of hashing common data-structures)?


The documentation is not particularly helpful. This question refers a library that seems to roll its own version, but I'm not sure non-portability would be the reason for it.

like image 810
loopbackbee Avatar asked Sep 28 '22 11:09

loopbackbee


1 Answers

No, hash() is not guaranteed to be portable.

Python 3.3 also uses hash randomisation by default, where certain types are hashed with a hash seed picked at start-up. Hash values then differ between Python interpreter invocations.

From the object.__hash__() documenation:

By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

See also PYTHONHASHSEED.

Python 2.6.8 and 3.2.3 and newer support the same feature but have it normally disabled.

Python 3.2 introduced a sys.hash_info named tuple that gives you details about the hash implementation for the current interpreter.

If you need a portable hash, there are plenty of implementations. The standard library includes a cryptographic hash library called hashlib; these implementations are definitely portable. Another option would be the mm3 package which provides Murmur3 non-cryptographic hash function implementations.

Common data structures would need to be converted to bytes first; you could use serialisation for that, like the json or pickle modules.

like image 122
Martijn Pieters Avatar answered Oct 03 '22 14:10

Martijn Pieters