I'm using hashing of strings for seeding random states in the following way:
context = "string"
seed = hash(context) % 4294967295 # This is necessary to keep the hash within allowed seed values
np.random.seed(seed)
This is unfortunately (for my usage) non-deterministic between runs in Python 3.3 and up. I do know that I could set the PYTHONHASHSEED
environment variable to an integer value to regain the determinism, but I would probably prefer something that feels a bit less hacky, and won't entirely disregard the extra security added by random hashing. Suggestions?
Built-In Hashing The result is different and will be different for each new Python invocation. Python has never guaranteed that . hash() is deterministic.
Deterministic. A hash procedure must be deterministic—meaning that for a given input value it must always generate the same hash value. In other words, it must be a function of the data to be hashed, in the mathematical sense of the term.
hashlib contains many different secure hash algorithms, which are by definition deterministic. This is something we can work with, though: it changes our problem from we need a stable hash function to we need a deterministic way of serializing objects to bytes.
Python has a built-in library, hashlib , that is designed to provide a common interface to different secure hashing algorithms. The module provides constructor methods for each type of hash. For example, the . sha256() constructor is used to create a SHA256 hash.
Forcing Python's built-in hash
to be deterministic is intrinsically hacky. If you want to avoid hackitude, use a different hashing function -- see e.g in Python-2: https://docs.python.org/2/library/hashlib.html,
and in Python-3: https://docs.python.org/3/library/hashlib.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With