Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does python's hash function remain identical across different versions?

I'm currently using hash on tuples of integers and strings (and nested tuples of integers and strings etc.) in order to compute the uniqueness of some objects. Barring that there might be a hash collisions, I wonder - is the hash function on those data types guaranteed to return the same result for different versions of Python?

like image 459
Claudiu Avatar asked May 09 '13 00:05

Claudiu


People also ask

Is hash always the same Python?

Python hash()If an item has a hash value that never changes during its lifespan, it is hashable. Hence hash() function only works on immutable objects such as int, string, tuple, float, long, Unicode, bool. Mutable objects such as list, dict, set, bytearray are non-hashable.

Is Python hash function unique?

Python uses hash tables for dictionaries and sets. A hash table is an unordered collection of key-value pairs, where each key is unique.

Is Python hash function stable?

hash(): not stable, too restrictive An easy solution seems to be the built-in hash() function, which returns the integer hash of an object. However, it has a couple of issues. Also, hash() only supports hashable objects; this means no lists, dicts, or non-frozen dataclasses.

Is sha256 always the same?

Yes, if you hash the same input with the same function, you will always get the same result. This follows from the fact that it is a hash-function.


Video Answer


2 Answers

No. Apart from long-standing differences between 32- and 64-bit versions of Python, the hashing algorithm was changed in Python 3.3 to resolve a security issue:

By default, the hash() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

As a result, from 3.3 onwards hash() is not even guaranteed to return the same result across different invocations of the same Python version.

like image 62
Zero Piraeus Avatar answered Oct 18 '22 15:10

Zero Piraeus


I'm not sure what you are looking for, but you can always use hashlib if you're looking for consistent hashing.

>>> import hashlib
>>> t = ("values", "other")
>>> hashlib.sha256(str(t)).hexdigest()
'bc3ed71325acf1386b40aa762b661bb63bb72e6df9457b838a2ea93c95cc8f0c'

OR:

>>> h = hashlib.sha256()
>>> for item in t:
...     h.update(item)
...
>>> h.hexdigest()
'5e98df135627bc8d98250ca7e638aeb2ccf7981ce50ee16ce00d4f23efada068'
like image 38
monkut Avatar answered Oct 18 '22 14:10

monkut