Assume export DOCKER_BUILDKIT=1
.
Take main.py
:
i = 0
while True:
i += 1
Take this Dockerfile
:
FROM python:3.9-slim as base
COPY main.py .
FROM base as part_1
RUN echo "A" && python -m main
FROM base as part_2
RUN echo "B" && python -m main
FROM base as combined
COPY --from=part_1 . .
COPY --from=part_2 . .
Running docker build --no-cache .
followed by top
shows that the build is being parallelized to take 2 cores, expected from BuildKit:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22569 root 20 0 14032 11620 4948 R 100.0 0.0 0:10.43 python
22571 root 20 0 14032 11620 4948 R 100.0 0.0 0:10.34 python
But removing the echo
s from the Dockerfile
:
FROM python:3.9-slim as base
COPY main.py .
FROM base as part_1
RUN python -m main
FROM base as part_2
RUN python -m main
FROM base as combined
COPY --from=part_1 . .
COPY --from=part_2 . .
and rerunning docker build --no-cache .
followed by top
shows that the build is only taking one core (with the second process being an irrelevant one), unexpected from BuildKit:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24674 root 20 0 14032 11624 4952 R 100.0 0.0 1:00.40 python
2485 mishac 20 0 5824548 515428 126120 S 12.3 1.6 2:52.74 gnome-s+
Why is the version without the echo
s disabling the parallelization? It seems like an odd thing to be affecting it. Is it possible to keep the parallelization without the echo
s?
Version:
$ docker --version
Docker version 20.10.16, build aa7e414
Buildkit uses a low-level builder format (LLB) to compute a content addressable dependency graph. This allows it to optimize the build process by directly tracking the checksums of build graphs. All stages are analyzed before any processing is done.
Since you are starting from the same the same base image and executing the same RUN
command in each stage, Buildkit determines that this will produce the same output and only performs this operation once.
When you add the echo
command, you introduce a variance in the dependency graph that causes it to build two separate images, which it does in parallel as you expect. If you RUN
a different script or COPY
some unique file(s) in each stage they will build in parallel. Even just setting a unique ENV
is enough to trigger this.
Below is a very minimal test that demonstrates this behavior (using alpine as a base image which is only around 5.5MB) :
#!/bin/sh
sleep 10
touch /test
FROM alpine AS base
WORKDIR /run
COPY ./test.sh .
FROM base AS first
RUN /run/test.sh
FROM base AS second
RUN /run/test.sh
FROM base AS output
COPY --from=first /test .
COPY --from=second /test .
sudo DOCKER_BUILDKIT=1 docker build --no-cache .
You can see that the first
stage is skipped, and the second
stage took just over 10
seconds to complete. Yet the COPY
command in the output
stage has no trouble reading from the first
stage.
Now, if we add an ENV
with a unique value in each stage...
FROM alpine AS base
WORKDIR /run
COPY ./test.sh .
FROM base AS first
ENV test=A
RUN /run/test.sh
FROM base AS second
ENV test=B
RUN /run/test.sh
FROM base AS output
COPY --from=first /test .
COPY --from=second /test .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With