We have a large collection of python code that takes some input and produces some output. We would like to guarantee that, given the identical input, we produce identical output regardless of python version or local environment. (e.g. whether the code is run on Windows, Mac, or Linux, in 32-bit or 64-bit) We have been enforcing this in an automated test suite by running our program both with and without the <code>-R</code> option to python and comparing the output, assuming that would shake out any spots where our output accidentally wound up dependent on iteration over a <code>dict</code>. (The most common source of non-determinism in our code) However, as we recently adjusted our code to also support python 3, we discovered a place where our output depended in part on iteration over a <code>dict</code> that used <code>int</code>s as keys. This iteration order changed in python3 as compared to python2, and was making our output different. Our existing tests (all on python 2.7) didn't notice this. (Because <code>-R</code> doesn't affect the hash of <code>int</code>s) Once found, it was easy to fix, but we would like to have found it earlier. Is there any way to further stress-test our code and give us confidence that we've ferreted out all places where we end up implicitly depending on something that will possibly be different across python versions/environments? I think that something like <code>-R</code> or <code>PYTHONHASHSEED</code> that applied to numbers as well as to <code>str</code>, <code>bytes</code>, and <code>datetime</code> objects could work, but I'm open to other approaches. I would however like our automated test machine to need only a single python version installed, if possible. Another acceptable alternative would be some way to run our code with pypy tweaked so as to use a different order when iterating items out of a <code>dict</code>; I think our code runs on pypy, though it's not something we've ever explicitly supported. However, if some pypy expert gives us a way to tweak dictionary iteration order on different runs, it's something we'll work towards.

Using PyPy is not the best choice here, given that it always retain the insertion order in its dicts (with a method that makes dicts use less memory). We can of course make it change the order dicts are enumerated, but it defeats the point. Instead, I'd suggest to hack at the CPython source code to change the way the hash is used inside dictobject.c. For example, after each <code>hash = PyObject_Hash(key); if (hash == -1) { ..error.. };</code> you could add <code>hash ^= HASH_TWEAK;</code> and compile different versions of CPython with different values for <code>HASH_TWEAK</code>. (I did such a thing at one point, but I can't find it any more. You need to be a bit careful about where the hash values are the original ones or the modified ones.)

Equivalent to python's -R option that affects the hash of ints

Tags:

python

python-3.x

pypy

python-2.7

We have a large collection of python code that takes some input and produces some output.

We would like to guarantee that, given the identical input, we produce identical output regardless of python version or local environment. (e.g. whether the code is run on Windows, Mac, or Linux, in 32-bit or 64-bit)

We have been enforcing this in an automated test suite by running our program both with and without the -R option to python and comparing the output, assuming that would shake out any spots where our output accidentally wound up dependent on iteration over a dict. (The most common source of non-determinism in our code)

However, as we recently adjusted our code to also support python 3, we discovered a place where our output depended in part on iteration over a dict that used ints as keys. This iteration order changed in python3 as compared to python2, and was making our output different. Our existing tests (all on python 2.7) didn't notice this. (Because -R doesn't affect the hash of ints) Once found, it was easy to fix, but we would like to have found it earlier.

Is there any way to further stress-test our code and give us confidence that we've ferreted out all places where we end up implicitly depending on something that will possibly be different across python versions/environments? I think that something like -R or PYTHONHASHSEED that applied to numbers as well as to str, bytes, and datetime objects could work, but I'm open to other approaches. I would however like our automated test machine to need only a single python version installed, if possible.

Another acceptable alternative would be some way to run our code with pypy tweaked so as to use a different order when iterating items out of a dict; I think our code runs on pypy, though it's not something we've ever explicitly supported. However, if some pypy expert gives us a way to tweak dictionary iteration order on different runs, it's something we'll work towards.

304

asked Jun 02 '17 08:06

Daniel Martin

1 Answers

Using PyPy is not the best choice here, given that it always retain the insertion order in its dicts (with a method that makes dicts use less memory). We can of course make it change the order dicts are enumerated, but it defeats the point.

Instead, I'd suggest to hack at the CPython source code to change the way the hash is used inside dictobject.c. For example, after each hash = PyObject_Hash(key); if (hash == -1) { ..error.. }; you could add hash ^= HASH_TWEAK; and compile different versions of CPython with different values for HASH_TWEAK. (I did such a thing at one point, but I can't find it any more. You need to be a bit careful about where the hash values are the original ones or the modified ones.)

107

answered Oct 03 '22 06:10

Armin Rigo

Related questions
                            
                                Reading UTF-8 with BOM using Python CSV module causes unwanted extra characters [duplicate]
                            
                                How to checkpoint a long-running function pythonically?
                            
                                Parsing multipart/form-data in django-rest-framework
                            
                                Open quantum system modelling
                            
                                static openCL class not properly released in python module using boost.python
                            
                                Keras. ValueError: I/O operation on closed file
                            
                                Celery vs. ProcessPoolExecutor / ThreadPoolExecutor
                            
                                Tweaking axis labels and names orientation for 3D plots in matplotlib
                            
                                Place ipywidgets into HTML into Jupyter notebook
                            
                                Depend on git repository in setup.py
                            
                                Matching Unicode word boundaries in Python
                            
                                tensorflow: efficient feeding of eval/train data using queue runners
                            
                                How to get XKCD font working in matplotlib
                            
                                Import method from __init__.py
                            
                                Django: ContentTypes during migration while running tests
                            
                                Panda Dataframe Resampling based on column criteria
                            
                                How to apply outer product for tensors without unnecessary increase of dimensions?
                            
                                Keras + Tensorflow: Prediction on multiple gpus
                            
                                Python - Access to a protected member _ of a class
                            
                                Cross validation with grid search returns worse results than default

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With