Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I add Python's pyc files to .dockerignore?

I've seen several examples of .dockerignore files for Python projects where *.pyc files and/or __pycache__ folders are ignored:

**/__pycache__
*.pyc

Since these files/folders are going to be recreated in the container anyway, I wonder if it's a good practice to do so.

like image 958
planetp Avatar asked Jan 10 '20 15:01

planetp


People also ask

What should I ignore in Dockerignore?

What if you ignore the dockerfile? It is true that you can also mention the dockerfile inside the . dockerignore file and exclude it from the Docker build context. In fact, it is a common practice than you might have thought.

Do I need PYC files?

pyc file is generated, there is no need of *. py file, unless you edit it.

Where should Dockerignore file be?

dockerignore file is on the root directory of your context, it will ignore it if it is somewhere in the subfolder.

Should Dockerignore ignore Dockerfile?

We suggest you include the Dockerfile in the Docker image (i.e. not mention it in . dockerignore) as it can help the consumers of the image to understand how it was build. Before you do that, make sure that you Dockerfile does not contain any sensitive information.


1 Answers

Yes, it's a recommended practice. There are several reasons:

Reduce the size of the resulting image

In .dockerignore you specify files that won't go to the resulting image, it may be crucial when you're building the smallest image. Roughly speaking the size of bytecode files is equal to the size of actual files. Bytecode files aren't intended for distribution, that's why we usually put them into .gitignore as well.

Cache related problems

In earlier versions of Python 3.x there were several cached related issues:

Python’s scheme for caching bytecode in .pyc files did not work well in environments with multiple Python interpreters. If one interpreter encountered a cached file created by another interpreter, it would recompile the source and overwrite the cached file, thus losing the benefits of caching.

Since Python 3.2 all the cached files prefixed with interpreter version as mymodule.cpython-32.pyc and presented under __pychache__ directory. By the way, starting from Python 3.8 you can even control a directory where the cache will be stored. It may be useful when you're restricting write access to the directory but still want to get benefits of cache usage.

Usually, the cache system works perfectly, but someday something may go wrong. It worth to note that the cached .pyc (lives in the same directory) file will be used instead of the .py file if the .py the file is missing. In practice, it's not a common occurrence, but if some stuff keeps up being "there", thinking about remove cache files is a good point. It may be important when you're experimenting with the cache system in Python or executing scripts in different environments.

Security reasons

Most likely that you don't even need to think about it, but cache files can contain some sort of sensitive information. Due to the current implementation, in .pyc files presented an absolute path to the actual files. There are situations when you don't want to share such information.


It seems that interacting with bytecode files is a quite frequent necessity, for example, django-extensions have appropriate options compile_pyc and clean_pyc.

like image 96
funnydman Avatar answered Oct 14 '22 05:10

funnydman