Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numerical simulation giving different results in Python 3.2 vs 3.3

This might be a weird question, but here it goes:

I have a numerical simulation. It's not a particularly long program, but somewhat lengthy to explain what it's doing. I am running the simulation a thousand times and computing the average result and the variance, and the variance is quite small, on the order of 10^(-30).

However, I have noticed that when I run the program in python 3.3 things get weird. See in Python 2.7 and Python 3.2 I always get the same answer, every time. Same averages, same tiny variances.

But when I run it in Python 3.3, I get a different answer every time. That is, a different average, and different (but still tiny) variances. This is extremely odd, because the laws of probability say that this can't happen if the variance is actually that small. So I'm wondering, what the hell is going on in the 3.3 interpreter that changed since 3.2, that's causing my simulations to go crazy?

Here are some things I've thought of:

  • I might have a weird 32-bit/64-bit discrepancy in my versions of Python, but no I checked and they're all running 64-bit.
  • I might be having some errors in float/int conversions, but this would be taken care of in Python 3.2 since they made division return floats when appropriate, so the 3.2 and 3.3 results should be the same.
  • My simulations are represented as generators, so maybe something changed in 3.3 with generators, but I can't tell what that is.
  • There is some change in numerical floating point representations that I have no idea about.
  • There is some underlying change in one of those functions whose result is "undetermined" that affects the initial conditions of my algorithm. For example, somewhere in my code I order my data columns which were originally a dictionary using "list(table.keys())" and there may have been a change in how list decides to order a dictionaries keys from 3.2 to 3.3. But if that were the case, then the code should still do the same thing every time but it doesn't (it seems quite odd to intentionally make the ordering of a list random!).

Does anyone have pointers to what changed from 3.2 to 3.3 that might be causing my problems?

like image 679
JeremyKun Avatar asked Dec 15 '22 03:12

JeremyKun


1 Answers

Your last bullet point is most likely the cause. At python3.3, hash randomization was enabled by default to address a security concern. Basically, the idea is that you now never know exactly how your strings will hash (which determines their order in the dictionary).

Here's a demo:

d = {"a": 1, "b": 2, "c": 3}
print(d)

On my machine, with python3.4, this results in 3 differently ordered results:

$ python3.4 test.py
{'a': 1, 'c': 3, 'b': 2}
$ python3.4 test.py
{'c': 3, 'b': 2, 'a': 1}
$ python3.4 test.py
{'b': 2, 'c': 3, 'a': 1}

Before hash randomization, if you knew how a string would hash, a malicious attacker with enough knowledge of your application could feed it data to cause dictionary lookup to run in O(n) time instead of the usual O(1) for dictionary lookups. That could cause serious performance degradation for some applications.

You can disable the hash randomization as documented here. At some point, they also introduced a -R flag to python which enabled hash randomization on an "opt in" basis. This option is at least available for python3.2, so you could use that to test our hypothesis.

like image 131
mgilson Avatar answered Dec 21 '22 22:12

mgilson