I'm trying to generate and re-use a yarn install cache when building a Docker image using Docker BuildKit. The yarn cache is persisted in the directory .yarn/cache and should never be included in the final image (.yarn/cache is relative to the build context root). The .yarn/cache directory should be shared among multiple builds in order to always start from a warm cache and have a fast yarn install command (even when we have a cache miss due to a change in package.json). If we could have access to .yarn/cache content after docker build ends, will be easy to share between multiple builds, for example uploading it to an Amazon S3 or GCS bucket.
I've considered two options:
RUN --mount=type=bindRUN --mount=type=cacheDescribed below why either of the two methods don't work.
The (simplified) Dockerfile looks like this:
ENV YARN_CACHE_FOLDER ".yarn/cache"
COPY package.json yarn.lock ./
RUN --mount=type=bind,source=.yarn/cache,target=.yarn/cache,rw yarn install --frozen-lockfile
Unfortunately no data is present in .yarn/cache directory after docker build command ends.
The reason that no data is persisted is described in the rw option documentation: Allow writes on the mount. Written data will be discarded. If the written data is discarded, what's a working method for generating the cache the first time?
RUN --mount=type=cacheAlternatively I considered using RUN --mount=type=cache. Unfortunately there doesn't seem to be an easy way of persisting the cache in a local directory of the build host for being easily saved to an Amazon S3 or GCS bucket. If the cache is not persisted, we can't use it across different Cloud Builds if the Docker daemon state is not shared between them.
To say it in another way: what is the best method for sharing a cache directory between different docker build that are running on different machines, without including this cache in the image? Is there any other way I'm missing here?
RUN --mount=type=bind: allow to mount a directory as if it was local, but effectively doesn't allow to write to that directory, so I can't generate the cache on the first run.RUN --mount=type=cache: allow to share the cache between multiple builds on the same machine, but if we're running multiple different docker build (on different machines) it won't help because the cache is always empty.RUN --mount=type=cache is the correct approach here, because as you already discovered, rw access to the cache is necessary for it to be of any use across builds. Additionally, --cache-from and --cache-to explicitly do not include these types of cache mounts, so your cache will not be persisted across CI runs in this way.
What we therefore need are pre and post-run steps to pull/push the contents of the cache mount to S3 before and after each run. You can achieve this "cache dance" as follows:
date --iso=ns | tee scratchdir/buildstamp
docker buildx build -f scratchdir/Dancefile.inject scratchdir
<run your buildx build here>
date --iso=ns | tee scratchdir/buildstamp
docker buildx build -f scratchdir/Dancefile.extract scratchdir
The timestamp in the scratchdir is necessary to bust layer caching carried out by Docker. Create the the inject/extract Dockerfiles in the cachedir as well, and adjust them to suit your use case by adding more cache mounts and sync commands for each cache directory you want to sync to S3. The following example demonstrates transferring the .yarn/cache directory:
# Dancefile.inject
FROM peakcom/s5cmd:v2.0.0
COPY buildstamp buildstamp
RUN --mount=type=cache,sharing=shared,id=yarn,target=/builddir/.yarn/cache \
/s5cmd sync s3://cache-bucket/yarn/* /builddir/.yarn/cache
# Dancefile.extract
FROM peakcom/s5cmd:v2.0.0
COPY buildstamp buildstamp
RUN --mount=type=cache,sharing=shared,id=yarn,target=/builddir/.yarn/cache \
/s5cmd sync /builddir/.yarn/cache/* s3://cache-bucket/yarn/
With this process, the cache mount directories will be populated from S3 and available to any other buildkit build contexts using the same cache mount id. You can freely adjust the location of the cache mounts in your build Dockerfile, since the cache mounts are identified by their id, not mount point.
Further reading:
s5cmd as shown above to further accelerate transfers.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With