Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a python library that is constantly changing in a docker image or new container?

I organize my code in a python package (usually in a virtual environment like virtualenv and/or conda) and then usually call:

python <path_to/my_project/setup.py> develop

or

pip install -e <path_to/my_project/setup.py>

so that I can use the most recent version of my code. Since I develop mostly statistical or machine learning algorithms, I prototype a lot and change my code daily. However, recently the recommended way to run our experiments on the clusters I have access is through docker. I learned about docker and I think I have a rough idea of how to make it work but wanted wasn't quite sure if my solutions was good or if there might be better solutions out there.

The first solution that I thought is having a solution that copied the data in my docker image with:

COPY /path_to/my_project
pip install /path_to/my_project

and then pip installing it. The issue with this solution is that I have to actually build a new image each time which seems silly and was hoping I could have something better. To do this I was thinking of having a bash file like:

#BASH FILE TO BUILD AND REBUILD MY STUFF
# build the image with the newest version of 
# my project code and it pip installs it and its depedencies
docker build -t image_name .
docker run --rm image_name python run_ML_experiment_file.py 
docker kill current_container #not sure how to do get id of container
docker rmi image_name

as I said, my intuition tells me this is silly so I was hoping there was a single command way to do this with Docker or with a single Dockerfile. Also, note the command should use -v ~/data/:/data to be able to get the data and some other volume/mount to write to (in the host) when it finishes training.

Another solution that I thought was to have all the python dependencies or other dependencies that my library needs in the Dockerfile (and hence in the image) and then somehow executing in the running container the installation of my library. Maybe with docker exec [OPTIONS] CONTAINER COMMAND as:

docker exec CONTAINER pip install /path_to/my_project

in the running container. After that then I could run the real experiment I want to run with the same exec command:

docker exec CONTAINER python run_ML_experiment_file.py

though, I still don't know how to systematically get the container id though (because I probably don't want to look up the container id every time I do this).

Ideally in my head the best conceptual solution would be to simply have the Dockerfile know from the beginning to which file it should mount to (i.e. /path_to/my_project) and then somehow do python [/path_to/my_project] develop inside the image so that it would always be linked to the potentially changing python package/project. That way I can run my experiments with a single docker command as in:

docker run --rm -v ~/data/:/data python run_ML_experiment_file.py

and not have to explicitly update the image myself every time (that includes not having to re install parts of the image that should be static) since its always in sync with the real library. Also, having some other script build a new image from scratch each time is not what I am looking for. Also, It would be nice to be able to avoid writing any bash too if possible.


I think I am very close to a good solution. What I will do instead of building a new image each time I will simply run the CMD command to do python develop as follow:

# install my library (only when the a container is spun)
CMD python ~/my_tf_proj/setup.py develop

the advantage is that it will only pip install my library whenever I run a new container. This solves the development issue because re creating a new image takes to long. Though I just realized that if I use the CMD command then I can't run other commands given to my docker run, so I actually mean to run ENTRYPOINT .

Right now the only issue to complete this is that I am having issues using volume because I can't successfully link to my host project library within the Dockerfile (which seems to require an absolute path for some reason). I am currently doing doing (which doesn't seem to work):

VOLUME /absolute_path_to/my_tf_proj /my_tf_proj

why can't I link using the VOLUME command in my Dockerfile? My main intention with using VOLUME is making my library (and other files that are always needed by this image) accessible when the CMD command tries to install my library. Is it possible to just have my library available all the time when a container is initiated?

Ideally I wanted to just have the library be installed automatically when a container is run and if possible, since the most recent version of the library is always required, have it install when a container is initialized.

As a reference right now my non-working Dockerfile looks as follow:

# This means you derive your docker image from the tensorflow docker image
# FROM gcr.io/tensorflow/tensorflow:latest-devel-gpu
FROM gcr.io/tensorflow/tensorflow
#FROM python
FROM ubuntu

RUN mkdir ~/my_tf_proj/
# mounts my tensorflow lib/proj from host to the container
VOLUME /absolute_path_to/my_tf_proj

#
RUN apt-get update

#
apt-get install vim

#
RUN apt-get install -qy python3
RUN apt-get install -qy python3-pip
RUN pip3 install --upgrade pip

#RUN apt-get install -y python python-dev python-distribute python-pip

# have the dependecies for my tensorflow library
RUN pip3 install numpy
RUN pip3 install keras
RUN pip3 install namespaces
RUN pip3 install pdb

# install my library (only when the a container is spun)
#CMD python ~/my_tf_proj/setup.py develop
ENTRYPOINT python ~/my_tf_proj/setup.py develop

As a side remark:

Also, for some reason it requires me to do RUN apt-get update to be able to even install pip or vim in my container. Do people know why? I wanted to do this because just in case I wanted to attach to the container with a bash terminal, it would be really helpful.

Seems that Docker just forces you to apt install to always have the most recent version of software in the container?


Bounty:

what a solution with COPY? and perhaps docker build -f path/Docker .. See: How does one build a docker image from the home user directory?

like image 410
Charlie Parker Avatar asked Dec 08 '16 20:12

Charlie Parker


People also ask

Do I have to rebuild the docker container every time?

You don't need to rebuild your Docker image in development for each tiny code change. If you mount your code into your dev container, you don't have to build a new image on every code change and iterate faster. It's a great feeling when you make changes and see the results right away!

How do I keep a Python docker container running?

The simplest way to keep the container running is to pass a command that never ends. We can use never-ending commands in any of the following ways: ENTRYPOINT or CMD directive in the Dockerfile. Overriding ENTRYPOINT or CMD in the docker run command.

Do docker images automatically update?

Your containers are now set up to automatically update whenever you push a new Docker image to your image repository, or when an external maintainer updates an image you're watching.


1 Answers

During development it is IMO perfectly fine to map/mount the hostdirectory with your ever changing sources into the Docker container. The rest (the python version, the other libraries you are dependent upon you can all install in the normal way in the the docker container.

Once stabilized I remove the map/mount and add the package to the list of items to install with pip. I do have a separate container running devpi so I can pip-install packages whether I push them all the way to PyPI or just push them to my local devpi container.

Doing speed up container creation even if you use the common (but more limited) python [path_to_project/setup.py] develop. Your Dockerfile in this case should look like:

 # the following seldom changes, only when a package is added to setup.py
 COPY /some/older/version/of/project/plus/dependent/packages /older/setup
 RUN pip /older/setup/your_package.tar.gz

 # the following changes all the time, but that is only a small amount of work
 COPY /latest/version/of/project     
 RUN python [path_to_project/setup.py] develop

If the first copy would result in changes to files under /older/setup then the container gets rebuilt from there.

Running python ... develop still makes more time and you need to rebuild/restart the container. Since my packages all can also be just copied in/linked to (in addition to be installed) that is still a large overhead. I run a small program in the container that checks if the (mounted/mapped) sources change and then reruns anything I am developing/testing automatically. So I only have to save a new version and watch the output of the container.

like image 118
Anthon Avatar answered Sep 19 '22 00:09

Anthon