Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the advantage of setting zip_safe to True when packaging a Python project?

The setuptools documentation only states:

For maximum performance, Python packages are best installed as zip files. Not all packages, however, are capable of running in compressed form, because they may expect to be able to access either source code or data files as normal operating system files. So, setuptools can install your project as a zipfile or a directory, and its default choice is determined by the project's zip_safe flag (reference).

In practical terms, what is the performance benefit gained? Is it worth investigating if my projects are zip-safe, or are the benefits generally minimal?

like image 438
saffsd Avatar asked Apr 08 '13 00:04

saffsd


People also ask

What is zip_safe?

Zip_safe flag allows you to include non-code files into the Python package by installing setup tools in your project as a zip file or a directory. The Python packages can run directly from a zip file where the default choice is determined by the project's zip_safe flag.

What is python setup py Bdist_wheel?

What is python setup py bdist_wheel? python setup.py bdist_wheel. This will build any C extensions in the project and then package those and the pure Python code into a . whl file in the dist directory.

How do I create a Python package using setup py?

Installing Python Packages with Setup.py To install a package that includes a setup.py file, open a command or terminal window and: cd into the root directory where setup.py is located. Enter: python setup.py install.


2 Answers

Zip files take up less space on disk, which also means they're more quickly read from disk. Since most things are I/O bound, the overhead in decompressing the packaging may be less than the overhead in reading a larger file from disk. Moreover, it's likely that a single, small-ish zip file is stored sequentially on disk, while a collection of smaller files may be more spread out. On rotational media, this also increases read performance by cutting down the number of seeks. So you generally optimize your disk usage at the cost of some CPU time, which may dramatically improve your import and load times.

like image 180
Livius Avatar answered Oct 04 '22 00:10

Livius


There are several advantages, in addition to the ones already mentioned.

Reading a single large .egg file (and unzipping it) may be significantly faster than loading multiple (potentially a lot of) smaller .py files, depending on the storage medium/filesystem on which it resides.

Some filesystem have a large block size (e.g., 1MB), which means that dealing with small files can be expensive. Even though your files are small (say, 10KB), you may actually be loading a 1MB block from disk when reading it. Typically, filesystems combine multiple small files in a large block to mitigate this a bit.

On filesystems where access to file metadata is slow (which sometimes happens with shared filesystems, like NFS), accessing a large amount of files may be very expensive too.

Of course, zipping the whole bunch also helps, since that means that less data will have to be read in total.

Long story short: it may matter a lot if your filesystem is more suited for a small amount of large files.

like image 21
Kenneth Hoste Avatar answered Oct 04 '22 00:10

Kenneth Hoste