Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my Docker cache get invalidated by this COPY command?

Tags:

docker

My docker builders in CI system get destroyed after inactivity, thus losing the local cache. I am using --cache-from by first pulling the most recent image from quay.io repo and then using that as a --cache-from in the next build. I am running docker version 17.12.0-ce. Dockerfile (for the pertinent part) looks like:

FROM ubuntu:16.04
RUN apt-get update && apt-get install -y \
ant \
build-essential \
software-properties-common \
libncurses5-dev \
libncursesw5-dev \
libcurl4-openssl-dev \
libboost-dev \
libfreetype6-dev \
zlib1g-dev \
r-base \
default-jdk \
python-dev \
python-setuptools \
python-pip \
python3-dev \
python3-setuptools \
python3-pip \
git \
wget \
unzip \
ghostscript \
pkg-config


RUN mkdir /software
WORKDIR /software
ENV PATH="/software:${PATH}"

RUN git clone --branch v0.2.19 --single-branch 
https://github.com/xianyi/OpenBLAS
RUN cd OpenBLAS && make FC=gfortran TARGET=NEHALEM USE_THREAD=0 && make 
PREFIX=/opt/openblas install
ENV LD_LIBRARY_PATH="/opt/openblas/lib:${LD_LIBRARY_PATH}"

# Install samtools dependencies
RUN wget http://zlib.net/zlib-1.2.11.tar.gz && tar -xvf zlib-1.2.11.tar.gz
RUN cd zlib-1.2.11 && ./configure && make && make install
RUN wget http://bzip.org/1.0.6/bzip2-1.0.6.tar.gz && tar -xvf bzip2-
1.0.6.tar.gz
RUN cd bzip2-1.0.6 && make && make install
RUN wget https://tukaani.org/xz/xz-5.2.3.tar.gz && tar -xvf xz-5.2.3.tar.gz
RUN cd xz-5.2.3 && ./configure && make && make install

RUN pip install common python-dateutil cython

RUN pip3 install common python-dateutil cython

# Install numpy 1.11.3 (python2/3)
RUN git clone --branch v1.11.3 --single-branch https://github.com/numpy/numpy
COPY /docker_image/site.cfg numpy/
RUN cd numpy && python setup.py install
RUN cd numpy && python3 setup.py install

When I run my build with (clean machine, with nothing in cache):

docker pull quay.io/myorganization/myimage:tag

and then run the build with

docker build --cache-from=quay.io/myorganization/myimage:tag -f docker_image/Dockerfile -t quay.io/myorganization/myimage:newtag .

the build uses cache until COPY /docker_image/site.cfg numpy/ invalidates the cache. My .dockerignore looks like:

.git*

so things changing there should not be the problem. If I accidentally omitted some important information needed, please ask and I will promptly provide that. Any ideas on what could cause the cache invalidation on this particular spot would be highly appreciated.

edit: This cache invalidation happens even if I don't change anything in the repo between builds, by doing the following: Build image using tag1, Push image to quay.io, then on a clean machine Clone git repo, Pull image (tag1), Build image with tag2. Could it be something that changes in the numpy repo metadata? (Note: --single-branch should not, to my understanding, pull any info about other branches in that repo).

like image 434
user3274289 Avatar asked Jan 31 '18 21:01

user3274289


People also ask

How does Docker cache run commands?

The RUN command allows you to execute a command in the Docker image. If the layer that is generated by the RUN command already exists in cache, the RUN command will be executed only once. As you will see later, a COPY or an ADD command can invalidate the layer cache and make Docker to execute all RUN commands.

Does Docker copy overwrite directory?

When copying a single file to an existing LOCALPATH, the docker cp command will either overwrite the contents of LOCALPATH if it is a file or place it into LOCALPATH if it is a directory, overwriting an existing file of the same name if one exists. For example, this command: $ docker cp sharp_ptolemy:/tmp/foo/myfile.

How do I force Docker to not cache?

You can use the --no-cache option to disable caching or use a custom Docker build argument to enforce rebuilding from a certain step. Understanding the Docker build cache is powerful and will make you more efficient in building your Docker container.


1 Answers

The docker cache for a COPY or ADD command uses a hash of the files and directories. Included in that hash are the contents of every file, and even the permissions on the files. So if any of these changed by a single byte, the hash will be different and docker will have a cache miss, forcing the line to be rerun.

From the point of the first cache miss, all remaining lines will need to be rebuilt since the preceding layer is now new and has not be used to run any of the following steps.

like image 146
BMitch Avatar answered Sep 27 '22 19:09

BMitch