Deploying python with docker, images too big

Tags:

We've built a large python repo that uses lots of libraries (numpy, scipy, tensor flow, ...) And have managed these dependencies through a conda environment. Basically we have lots of developers contributing and anytime someone needs a new library for something they are working on they 'conda install' it.

Fast forward to today and now we need to deploy some applications that use our repo. We are deploying using docker, but are finding that these images are really large and causing some issues, e.g. 10+ GB. However each individual application only uses a subset of all the dependencies in the environment.yml.

Is there some easy strategy for dealing with this problem? In a sense I need to know the dependencies for each application, but I'm not sure how to do this in an automated way.

Any help here would be great. I'm new to this whole AWS, Docker, and python deployment thing... We're really a bunch of engineers and scientists who need to scale up our software. We have something that works, it just seems like there has to be a better way 😁.

745

asked Oct 09 '18 11:10

matt

2 Answers

First see if there are easy wins to shrink the image, like using Alpine Linux and being very careful about what gets installed with the OS package manager, and ensuring you only allow installing dependencies or recommended items when truly required, and that you clean up and delete artifacts like package lists, big things you may not need like Java, etc.

The base Anaconda/Ubuntu image is ~ 3.5GB in size, so it's not crazy that with a lot of extra installations of heavy third-party packages, you could get up to 10GB. In production image processing applications, I routinely worked with Docker images in the range of 3GB to 6GB, and those sizes were after we had heavily optimized the container.

To your question about splitting dependencies, you should provide each different application with its own package definition, basically a setup.py script and some other details, including dependencies listed in some mix of requirements.txt for pip and/or environment.yaml for conda.

If you have Project A in some folder / repo and Project B in another, you want people to easily be able to do something like pip install <GitHub URL to a version tag of Project A> or conda env create -f ProjectB_environment.yml or something, and voila, that application is installed.

Then when you deploy a specific application, have some CI tool like Jenkins build the container for that application using a FROM line to start from your thin Alpine / whatever container, and only perform conda install or pip install for the dependency file for that project, and not all the others.

This also has the benefit that multiple different projects can declare different version dependencies even among the same set of libraries. Maybe Project A is ready to upgrade to the latest and greatest pandas version, but Project B needs some refactoring before the team wants to test that upgrade. This way, when CI builds the container for Project B, it will have a Python dependency file with one set of versions, while in Project A's folder or repo of source code, it might have something different.

107

answered Sep 23 '22 21:09

ely

There are many ways to tackle this problem:

Lean docker images - start with a very simple base image; and layer your images. See best practices for building images.
Specify individual app requirements using requirements.txt files (make sure you pin your versions) and see specific instructions for conda.
Build and install "on demand"; when you do docker build, only install those requirements for the specific applications and not one giant image for every possible eventuality.

answered Sep 26 '22 21:09

Burhan Khalid

Related questions
                            
                                Any way to change color bar (cbar) in seaborn to a legend (for a binary heatmap)?
                            
                                Targeting a specific metric to optimize in tensorflow
                            
                                How to assign custom color to masked cells in seaborn heatmap?
                            
                                Pandas: Groupby and iterate with conditionals within groups?
                            
                                "PACKAGES DO NOT MATCH THE HASHES" error with pip
                            
                                Sigmoid function returns 1 for large positive inputs
                            
                                ModuleNotFoundError: No module named 'pip.download' when trying to install Python package for Django
                            
                                How to set requests 'user-agent' header globally
                            
                                Python xarray remove coordinates with all missing variables
                            
                                Storing the results of Web Scraping into Database
                            
                                How do I give a delay in user input to a Textbox in a dash app?
                            
                                Fastest way to generate a dict from list where key == value
                            
                                Set custom seaborn color palette using hex codes, and name the colors
                            
                                predict_proba() method of Keras model does not exist
                            
                                Python3 parallel code via multiprocessing.pool is slower than sequential code
                            
                                Vectorize numpy code with operation depending on previous value
                            
                                How to test a Python CLI program with click, coverage.py, and Tox?
                            
                                Run custom task when call `pip install`
                            
                                How to implement Backus-Naur Form in Python
                            
                                Zip single file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Deploying python with docker, images too big

Tags:

python

docker

amazon-web-services

matt

People also ask

2 Answers

ely

Burhan Khalid

Recent Activity

Donate For Us