Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cache Cargo dependencies in a Docker volume

I'm building a Rust program in Docker (rust:1.33.0).

Every time code changes, it re-compiles (good), which also re-downloads all dependencies (bad).

I thought I could cache dependencies by adding VOLUME ["/usr/local/cargo"]. edit I've also tried moving this dir with CARGO_HOME without luck.

I thought that making this a volume would persist the downloaded dependencies, which appear to be in this directory.

But it didn't work, they are still downloaded every time. Why?


Dockerfile

FROM rust:1.33.0

VOLUME ["/output", "/usr/local/cargo"]

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .
COPY src/ ./src/

RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

Built with just docker build ..

Cargo.toml

[package]
name = "mwe"
version = "0.1.0"
[dependencies]
log = { version = "0.4.6" }

Code: just hello world

Output of second run after changing main.rs:

...
Step 4/6 : COPY Cargo.toml .
---> Using cache
---> 97f180cb6ce2
Step 5/6 : COPY src/ ./src/
---> 835be1ea0541
Step 6/6 : RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
---> Running in 551299a42907
Updating crates.io index
Downloading crates ...
Downloaded log v0.4.6
Downloaded cfg-if v0.1.6
Compiling cfg-if v0.1.6
Compiling log v0.4.6
Compiling mwe v0.1.0 (/)
Finished dev [unoptimized + debuginfo] target(s) in 17.43s
Removing intermediate container 551299a42907
---> e4626da13204
Successfully built e4626da13204
like image 881
Mark Avatar asked Mar 01 '19 21:03

Mark


4 Answers

A volume inside the Dockerfile is counter-productive here. That would mount an anonymous volume at each build step, and again when you run the container. The volume during each build step is discarded after that step completes, which means you would need to download the entire contents again for any other step needing those dependencies.

The standard model for this is to copy your dependency specification, run the dependency download, copy your code, and then compile or run your code, in 4 separate steps. That lets docker cache the layers in an efficient manner. I'm not familiar with rust or cargo specifically, but I believe that would look like:

FROM rust:1.33.0

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .
RUN cargo fetch # this should download dependencies
COPY src/ ./src/

RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

Another option is to turn on some experimental features with BuildKit (available in 18.09, released 2018-11-08) so that docker saves these dependencies in what is similar to a named volume for your build. The directory can be reused across builds, but never gets added to the image itself, making it useful for things like a download cache.

# syntax=docker/dockerfile:experimental
FROM rust:1.33.0

VOLUME ["/output", "/usr/local/cargo"]

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .
COPY src/ ./src/

RUN --mount=type=cache,target=/root/.cargo \
    ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

Note that the above assumes cargo is caching files in /root/.cargo. You'd need to verify this and adjust as appropriate. I also haven't mixed the mount syntax with a json exec syntax to know if that part works. You can read more about the BuildKit experimental features here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

Turning on BuildKit from 18.09 and newer versions is as easy as export DOCKER_BUILDKIT=1 and then running your build from that shell.

like image 64
BMitch Avatar answered Oct 16 '22 13:10

BMitch


I would say, the nicer solution would be to resort to docker multi-stage build as pointed here and there

This way you can create yourself a first image, that would build both your application and your dependencies, then use, only, in the second image, the dependency folder from the first one

This is inspired by both your comment on @Jack Gore's answer and the two issue comments linked here above.

FROM rust:1.33.0 as dependencies

WORKDIR /usr/src/app

COPY Cargo.toml .

RUN rustup default nightly-2019-01-29 && \
    mkdir -p src && \
    echo "fn main() {}" > src/main.rs && \
    cargo build -Z unstable-options --out-dir /output

FROM rust:1.33.0 as application

# Those are the lines instructing this image to reuse the files 
# from the previous image that was aliased as "dependencies" 
COPY --from=dependencies /usr/src/app/Cargo.toml .
COPY --from=dependencies /usr/local/cargo /usr/local/cargo

COPY src/ src/

VOLUME /output

RUN rustup default nightly-2019-01-29  && \
    cargo build -Z unstable-options --out-dir /output

PS: having only one run will reduce the number of layers you generate; more info here

like image 23
β.εηοιτ.βε Avatar answered Oct 16 '22 12:10

β.εηοιτ.βε


Here's an overview of the possibilities. (Scroll down for my original answer.)

  • Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards remove the fake source and add the real ones. [Caches dependencies, but several fake files with workspaces].
  • Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards create a new layer with the dependencies and continue from there. [Similar to above].
  • Externally mount a volume for the cache dir. [Caches everything, relies on caller to pass --mount].
  • Use RUN --mount=type=cache,target=/the/path cargo build in the Dockerfile in new Docker versions. [Caches everything, seems like a good way, but currently too new to work for me. Executable not part of image]
  • Run sccache in another container or on the host, then connect to that during the build process. See this comment in Cargo issue 2644.
  • Use cargo-build-deps. [Might work for some, but does not support Cargo workspaces (in 2019)].
  • Wait for Cargo issue 2644. [There's willingness to add this to Cargo, but no concrete solution yet].
  • Using VOLUME ["/the/path"] in the Dockerfile does NOT work, this is per-layer (per command) only.

Note: one can set CARGO_HOME and ENV CARGO_TARGET_DIR in the Dockerfile to control where download cache and compiled output goes.

Also note: cargo fetch can at least cache downloading of dependencies, although not compiling.

Cargo workspaces suffer from having to manually add each Cargo file, and for some solutions, having to generate a dozen fake main.rs/lib.rs. For projects with a single Cargo file, the solutions work better.


I've got caching to work for my particular case by adding

ENV CARGO_HOME /code/dockerout/cargo
ENV CARGO_TARGET_DIR /code/dockerout/target

Where /code is the directory where I mount my code.

This is externally mounted, not from the Dockerfile.

EDIT1: I was confused why this worked, but @b.enoit.be and @BMitch cleared up that it's because volumes declared inside the Dockerfile only live for one layer (one command).

like image 3
Mark Avatar answered Oct 16 '22 14:10

Mark


You do not need to use an explicit Docker volume to cache your dependencies. Docker will automatically cache the different "layers" of your image. Basically, each command in the Dockerfile corresponds to a layer of the image. The problem you are facing is based on how Docker image layer caching works.

The rules that Docker follows for image layer caching are listed in the official documentation:

  • Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.

  • In most cases, simply comparing the instruction in the Dockerfile with one of the child images is sufficient. However, certain instructions require more examination and explanation.

  • For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.

  • Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used.

So the problem is with the positioning of the command COPY src/ ./src/ in the Dockerfile. Whenever there is a change in one of your source files, the cache will be invalidated and all subsequent commands will not use the cache. Therefore your cargo build command will not use the Docker cache.

To solve your problem it will be as simple as reordering the commands in your Docker file, to this:

FROM rust:1.33.0

RUN rustup default nightly-2019-01-29

COPY Cargo.toml .

RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]

COPY src/ ./src/

Doing it this way, your dependencies will only be reinstalled when there is a change in your Cargo.toml.

Hope this helps.

like image 2
Jack Avatar answered Oct 16 '22 12:10

Jack