We have a large collection of python code that takes some input and produces some output.
We would like to guarantee that, given the identical input, we produce identical output regardless of python version or local environment. (e.g. whether the code is run on Windows, Mac, or Linux, in 32-bit or 64-bit)
We have been enforcing this in an automated test suite by running our program both with and without the -R
option to python and comparing the output, assuming that would shake out any spots where our output accidentally wound up dependent on iteration over a dict
. (The most common source of non-determinism in our code)
However, as we recently adjusted our code to also support python 3, we discovered a place where our output depended in part on iteration over a dict
that used int
s as keys. This iteration order changed in python3 as compared to python2, and was making our output different. Our existing tests (all on python 2.7) didn't notice this. (Because -R
doesn't affect the hash of int
s) Once found, it was easy to fix, but we would like to have found it earlier.
Is there any way to further stress-test our code and give us confidence that we've ferreted out all places where we end up implicitly depending on something that will possibly be different across python versions/environments? I think that something like -R
or PYTHONHASHSEED
that applied to numbers as well as to str
, bytes
, and datetime
objects could work, but I'm open to other approaches. I would however like our automated test machine to need only a single python version installed, if possible.
Another acceptable alternative would be some way to run our code with pypy tweaked so as to use a different order when iterating items out of a dict
; I think our code runs on pypy, though it's not something we've ever explicitly supported. However, if some pypy expert gives us a way to tweak dictionary iteration order on different runs, it's something we'll work towards.
Python hash () function is a built-in function and returns the hash value of an object if it has one. The hash value is an integer which is used to quickly compare dictionary keys while looking at a dictionary. Syntax of Python hash () method: Syntax : hash (obj)
This function takes in an immutable Python object, and returns the hash value of this object. Remember that the hash value is dependent on a hash function, (from __hash__ () ), which hash () internally calls.
Remember that the hash value is dependent on a hash function, (from __hash__ () ), which hash () internally calls. This hash function needs to be good enough such that it gives an almost random distribution.
This function takes in an immutable Python object, and returns the hash value of this object. Remember that the hash value is dependent on a hash function, (from __hash__ () ), which hash () internally calls. This hash function needs to be good enough such that it gives an almost random distribution.
Using PyPy is not the best choice here, given that it always retain the insertion order in its dicts (with a method that makes dicts use less memory). We can of course make it change the order dicts are enumerated, but it defeats the point.
Instead, I'd suggest to hack at the CPython source code to change the way the hash is used inside dictobject.c. For example, after each hash = PyObject_Hash(key); if (hash == -1) { ..error.. };
you could add hash ^= HASH_TWEAK;
and compile different versions of CPython with different values for HASH_TWEAK
. (I did such a thing at one point, but I can't find it any more. You need to be a bit careful about where the hash values are the original ones or the modified ones.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With