I need to ship a compiled version of a python script and be able to prove (using a hash) that the compiled file is indeed the same as the original one.
What we use so far is a simple:
find . -name "*.py" -print0 | xargs -0 python2 -m py_compile
The issue is that this is not reproducible (not sure what are the fluctuating factors but 2 executions will not give us the same .pyc for the same python file) and forces us to always ship the same compiled version instead of being able to just give the build script to anyone to produce a new compiled version.
Is there a way to achieve that?
Thanks
Reproducible builds can also provide assurances around what software has been and will be shipped. If you know that your build process can be 100% bit-for-bit reproduced when given the same set of build inputs, you can trace any release, past or present, back to source.
In the context of statistics and data science, reproducibility means that our code—a map from data to estimates or predictions—should not depend on the specific computational environment in which data processing and data analysis originally took place.
Bazel is one of the best solutions available for creating reproducible, hermetic builds. It supports many languages like Python, Java, C, C++, Go, and more. Start by installing Bazel. To build our Flask application, we need to instruct Bazel to use python 3.8.
Wiki's and Readme files are the most common ways to document a build. While documenting a build is better than not documenting it, Wikis and Readme files have two flaws when used to describe a process: 1) they require humans to read them, and 2) they evolve.
Compiled Python files include a four-byte magic number and the four-byte datetime of compilation. This probably accounts for the discrepancies you are seeing.
If you omit bytes 5-8 from the checksumming process then you should see constant checksums for a given version of Python.
The format of the .pyc
file is given in this blog post by Ned Batchelder.
2019 / python3.7+ update: since PEP 552
python -m compileall -f --invalidation-mode=checked-hash [file|dir]
# or
export SOURCE_DATE_EPOCH=1 # set py_compile to use
python -m py_compile # pycompile.PycInvalidationMode.CHECKED_HASH
will create .pyc
s which will not change until their source code changes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With