Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

docker build with maven - how to prevent re-downloading dependencies

Tags:

java

docker

maven

I want the base image mavenDeps to download the dependencies and rebuild only when dependencies change, and the second image, mavenBuild to rebuild on code changes. However, on docker build . both maven commands download all dependencies. I might be misunderstanding how the stacking works or what to copy where.

What I have tried: explicitly copying everything from first container to second: COPY / / and various more specific COPY targets like .m2, building second container from the maven base image, like the first one, then copying everything from the first container.

Dockerfile:

FROM maven:3.5-jdk-8 as mavenDeps
COPY pom.xml pom.xml
RUN mvn dependency:resolve

FROM mavenDeps as mavenBuild
RUN mvn install

FROM java:8
COPY --from=mavenBuild ./target/*.jar ./
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

I am building with Docker Desktop 2.2.2.0 (engine 19.03.5) on MacOS.

EDIT 2020.03.04:

Answer from @gcallea effectively prevents re-downloading of dependencies listed in the pom file +1. However, the install step still pulls 100+ artifacts on each build triggered by a code change. Those are transient dependencies of maven-resources-plugin, maven-compiler-plugin and several other plugins which are not listed anywhere explicitly.

I need to work offline sometimes and would like to preload ALL dependencies, so no dependencies are pulled after code changes.

like image 905
kostja Avatar asked Mar 04 '20 09:03

kostja


Video Answer


2 Answers

Before to tell you how I would process, I will explain the issue that you encounter.

Your Dockerfile relies on the build multi-stage feature.
Here stages are considered as intermediary layers that are not kept as layers in the final image. To keep files/folders between layers you have to explicit copy them as you done.

So concretely, it means that in the below instructions : maven resolves all dependencies specified in your pom.xml and it stores them in the local repository located on the layer of that stage :

FROM maven:3.5-jdk-8 as mavenDeps
COPY pom.xml pom.xml
RUN mvn dependency:resolve

But as said, the stage content is not kept by default. So all downloaded dependencies in the local maven repo are lost since you never copy that in the next stage :

FROM mavenDeps as mavenBuild
RUN mvn install

Since the local repo of that image is empty : mvn install re-download all dependencies.


How to process ?

You have really many many ways.
The best choice depends on your requirement.
But whatever the way, the build strategy in terms of docker layers looks like :

Build stage (Maven image) :

  • pom copy to the image
  • dependencies and plugins downloads.
    About that, mvn dependency:resolve-plugins chained to mvn dependency:resolve may do the job but not always.
    Why ? Because these plugins and the package execution may rely on different artifacts/plugins and even for a same artifact/plugin, these may still pull a different version. So a safer approach while potentially slower is resolving dependencies by executing exactly the mvn package command (which will pull exactly dependencies that you are need) but by skipping the source compilation and by deleting the target folder to make the processing faster and to prevent any undesirable layer change detection for that step.
  • source code copy to the image
  • package the application

Run stage (JDK or JRE image) :

  • copy the jar from the previous stage

1) No explicit cache for maven dependencies : straight but annoying when pom changes frequently

If re-downloading all dependencies at every pom.xml change is acceptable.

Example by starting from your script :

########build stage########
FROM maven:3.5-jdk-8 as maven_build
WORKDIR /app

COPY pom.xml .
# To resolve dependencies in a safe way (no re-download when the source code changes)
RUN mvn clean package -Dmaven.main.skip -Dmaven.test.skip && rm -r target

# To package the application
COPY src ./src
RUN mvn clean package -Dmaven.test.skip

########run stage########
FROM java:8
WORKDIR /app

COPY --from=maven_build /app/target/*.jar

#run the app
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

Drawback of that solution ? Any changes in the pom.xml means re-create the whole layer that download and stores the maven dependencies.
That is generally not acceptable for applications with many dependencies, overall if you don't use a maven repository manager during the image build.

2) Explicit cache for maven dependencies : require more configurations and use of buildkit but that is more efficient because only required dependencies are downloaded

The only thing that changes here is that maven dependencies download are cached in the docker builder cache :

# syntax=docker/dockerfile:experimental
########build stage########
FROM maven:3.5-jdk-8 as maven_build
WORKDIR /app

COPY pom.xml .    
COPY src ./src

RUN --mount=type=cache,target=/root/.m2 mvn clean package  -Dmaven.test.skip

########run stage########
FROM java:8
WORKDIR /app

COPY --from=maven_build /app/target/*.jar

#run the app
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

To enable buildkit, the env variable DOCKER_BUILDKIT=1 has to be set (you can do that where you want : bashrc, command line, docker daemon json file...)

like image 91
davidxxx Avatar answered Sep 21 '22 22:09

davidxxx


You don't need to divide build phase into 2 different stages mavenDeps and mavenBuild. You can include a single buildstage taking advantage of Docker layers for the same purpose.

You can structure your Dockerfile as follow for your purpose:

#----
# Build stage
#----
FROM maven:3.5-jdk-8 as buildstage
# Copy only pom.xml of your projects and download dependencies
COPY pom.xml .
RUN mvn -B -f pom.xml dependency:go-offline
# Copy all other project files and build project
COPY . .
RUN mvn -B install

#----
# Final stage
#----
FROM java:8
COPY --from=buildstage ./target/*.jar ./
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

Doing this only when changes are made over pom.xml the dependencies will be re-dow nloaded. Otherwise Docker layer related to command RUN mvn -B -f pom.xml dependency:go-offline will be reused as cache.

like image 40
gregorycallea Avatar answered Sep 21 '22 22:09

gregorycallea