Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make the compilation of python source code reproducible

After installing jsonpickle on my machine ( pip install jsonpickle==1.4.1 --no-compile), I have noticed that the compilation of the pandas.py file in the ext subfolder is not always reproducible.

In the ext subfolder I executed the following bash code to compile all .py files to .pyc files:

python -m compileall -d somereldir --invalidation-mode checked-hash

this created a pandas.cpython-37.pyc file in the __pycache__ subdirectory. In the __pycache__ subdirectory, I then executed: xxd pandas.cpython-37.pyc > hex1.hex

If I do the abovementioned steps again and write the hexdump to hex2.hex, I noticed that there are two lines that do not match.

diff hex1.hex hex2.hex
288,289c288,289
< 000011f0: 0029 013e 0200 0000 723f 0000 00da 056e  .).>....r?.....n
< 00001200: 616d 6573 7213 0000 0029 0372 3300 0000  amesr....).r3...
---
> 000011f0: 0029 013e 0200 0000 da05 6e61 6d65 7372  .).>......namesr
> 00001200: 3f00 0000 7213 0000 0029 0372 3300 0000  ?...r....).r3...

I performed it several times and it appears that there are two "versions" of .pyc file, sometimes they match, sometimes they don't.

Because of this, I have several questions:

  1. Why is there a difference in the .pyc files?
  2. How can I make sure that the compiled .pyc file is always the same.
  3. I checked some other python libraries and all of them produced reproducible .pyc files, so what is different for this pandas.py file?
like image 388
Hadronymous Avatar asked Dec 13 '25 14:12

Hadronymous


1 Answers

After splitting the pandas.py file in smaller parts and compiling these, I was able to determine the location of the problem on line 135:

name_bundle = {k: v for k, v in meta.items() if k in {'name', 'names'}}

which answers the questions:

  1. line 135 contains a set ( {'name','names'}). The order of elements in a set is not necessarily preserved after compilation. Although dictionaries preserve insertion order as of Python 3.7, I could not find anything about order preservation of elements in sets for Python 3.7.
  2. Set the environment variable PYTHONHASHSEED to a fixed value.
  3. It is possible that these libraries do not contain any set.
like image 128
Hadronymous Avatar answered Dec 16 '25 20:12

Hadronymous



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!