Some of our Docker images require downloading larger binaries from a Nexus server or from the Internet, which is responsible for distributing Java, Node.js, Mobile (Android and iOS) apps. For instance, using either the ADD or the RUN instruction to download.
RUN curl -o docker https://get.docker.com/builds/Linux/x86_64/docker-latest
Considering that the command "docker build" will be looking at the instructions and caching depending on the mtime of the file, what's the approach that takes advantage of the caching mechanism while building those images, avoiding the re-download an entire binary? https://stackoverflow.com/a/26612694/433814.
Another question is if the resource changes, Docker will not be downloading the latest version.
Docker will NOT look at any caching mechanism before downloading using "RUN curl" nor ADD. It will repeat the step of downloading. However, Docker invalidates the cache if the mtime of the file has been changed https://stackoverflow.com/a/26612694/433814, among other things. https://github.com/docker/docker/blob/master/pkg/tarsum/versioning.go#L84
Here's a strategy that I've been working on to solve this problem when building Dockerfiles with dependencies from File storage or repository such as Nexus, Amazon S3 is to retrieve the ETag from the resource, caching it, and modifying the mdtime of a cache-flag file. (https://gist.github.com/marcellodesales/721694c905dc1a2524bc#file-s3update-py-L18). It follows the approach performed in Python (https://stackoverflow.com/a/25307587), Node.js (http://bitjudo.com/blog/2014/03/13/building-efficient-dockerfiles-node-dot-js/) projects.
Here's what we can do:
Here's a setup to demo this strategy:
Create a Web Server that handles HEAD requests and return an ETag header, usually returned by servers.
Build an image and verify that the dependent layer will download the resource for the first time
Rebuild the image and verify that the dependent layer will use the Cached value.
Changing the ETag value returned by Web Server handler to simulate a change.
Rebuild the image again and verify that the cache was used.
Suppose you have the following Node.js server serving files. Let's implement a HEAD operation and return a value.
// You'll see the client-side's output on the console when you run it.
var restify = require('restify');
// Server
var server = restify.createServer({
name: 'myapp',
version: '1.0.0'
});
server.head("/", function (req, res, next) {
res.writeHead(200, {'Content-Type': 'application/json; charset=utf-8',
'ETag': '"{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}"'});
res.end();
return next();
});
server.get("/", function (req, res, next) {
res.writeHead(200, {'Content-Type': 'application/json; charset=utf-8',
'ETag': '"{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}"'});
res.write("The file to be downloaded");
res.end();
return next();
});
server.listen(80, function () {
console.log('%s listening at %s', server.name, server.url);
});
// Client
var client = restify.createJsonClient({
url: 'http://localhost:80',
version: '~1.0'
});
client.head('/', function (err, req, res, obj) {
if(err) console.log("An error ocurred:", err);
else console.log('HEAD / returned headers: %j', res.headers);
});
Executing this will give you:
mdesales@ubuntu [11/27/201411:10:49] ~/dev/icode/fuego/interview (feature/supportLogAuditor *) $ node testserver.js
myapp listening at http://0.0.0.0:8181
HEAD / returned headers: {"content-type":"application/json; charset=utf-8",
"etag":"\"{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}\"",
"date":"Thu, 27 Nov 2014 19:10:50 GMT","connection":"keep-alive"}
Consider the following build script that caches the ETag Header in a file.
#!/bin/sh
# Delete the existing first, and get the headers of the server to a file "headers.txt"
# Grep the ETag to a "new-docker.etag" file
# If the file exists, verify if the ETag has changed and/or move/modify the mtime of the file
# Proceed with the "docker build" as usual
rm -f new-docker.etag
curl -I -D headers.txt http://192.168.248.133:8181/ && \
grep -o 'ETag[^*]*' headers.txt > new-docker.etag && \
rm -f headers.txt
if [ ! -f docker.etag ]; then
cp new-docker.etag docker.etag
else
new=$(cat docker.etag)
old=$(cat new-docker.etag)
echo "Old ETag = $old"
echo "New ETag = $new"
if [ "$old" != "$new" ]; then
mv new-docker.etag docker.etag
touch -t 200001010000.00 docker.etag
fi
fi
docker build -t platform.registry.docker.corp.intuit.net/container/mule:3.4.1 .
Building this would result as follows, considering I'm using the current cache.
mdesales@ubuntu [11/27/201411:54:08] ~/dev/github-intuit/docker-images/platform/mule-3.4 (master) $ ./build.sh
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}"
Date: Thu, 27 Nov 2014 19:54:16 GMT
Connection: keep-alive
Old ETag = ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}"
New ETag = ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}"
Sending build context to Docker daemon 51.71 kB
Sending build context to Docker daemon
Step 0 : FROM core.registry.docker.corp.intuit.net/runtime/java:7
---> 3eb1591273f5
Step 1 : MAINTAINER [email protected]
---> Using cache
---> 9bb8fff83697
Step 2 : WORKDIR /opt
---> Using cache
---> 3e3c96d96fc9
Step 3 : ADD docker.etag /tmp/docker.etag
---> Using cache
---> db3f82289475
Step 4 : RUN cat /tmp/docker.etag
---> Using cache
---> 0d4147a5f5ee
Step 5 : RUN curl -o docker https://get.docker.com/builds/Linux/x86_64/docker-latest
---> Using cache
---> 6bd6e75be322
Successfully built 6bd6e75be322
Changing the value of the ETag on the server and restarting the server to simulate the new update will result in updating the cache-flag file and invalidation of the Cache. For instance, the Etag was changed to "465fb0d9b9f143ad691c7c3bcf3801b47284f8333". Rebuilding will trigger a new download because the ETag file was updated, and Docker will verify that during the "ADD" instruction. Here, step #5 will run again.
mdesales@ubuntu [11/27/201411:54:16] ~/dev/github-intuit/docker-images/platform/mule-3.4 (master) $ ./build.sh
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8333}}"
Date: Thu, 27 Nov 2014 19:54:45 GMT
Connection: keep-alive
Old ETag = ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8333}}"
New ETag = ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8555}}"
Sending build context to Docker daemon 50.69 kB
Sending build context to Docker daemon
Step 0 : FROM core.registry.docker.corp.intuit.net/runtime/java:7
---> 3eb1591273f5
Step 1 : MAINTAINER [email protected]
---> Using cache
---> 9bb8fff83697
Step 2 : WORKDIR /opt
---> Using cache
---> 3e3c96d96fc9
Step 3 : ADD docker.etag /tmp/docker.etag
---> ac3b200c8cdc
Removing intermediate container 4cf0040dbc43
Step 4 : RUN cat /tmp/docker.etag
---> Running in 4dd38d30549a
ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8333}}"
---> 4fafbeac2180
Removing intermediate container 4dd38d30549a
Step 5 : RUN curl -o docker https://get.docker.com/builds/Linux/x86_64/docker-latest
---> Running in de920c7a2e28
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 13.5M 100 13.5M 0 0 1361k 0 0:00:10 0:00:10 --:--:-- 2283k
---> 95aff324da85
Removing intermediate container de920c7a2e28
Successfully built 95aff324da85
Considering that the ETag hasn't changed, the cache-flag file will continue being the same and Docker will do a super fast build using the cache.
mdesales@ubuntu [11/27/201411:54:56] ~/dev/github-intuit/docker-images/platform/mule-3.4 (master) $ ./build.sh
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8333}}"
Date: Thu, 27 Nov 2014 19:54:58 GMT
Connection: keep-alive
Old ETag = ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8333}}"
New ETag = ETag: "{SHA1{465fb0d9b9f143ad691c7c3bcf3801b47284f8333}}"
Sending build context to Docker daemon 51.71 kB
Sending build context to Docker daemon
Step 0 : FROM core.registry.docker.corp.intuit.net/runtime/java:7
---> 3eb1591273f5
Step 1 : MAINTAINER [email protected]
---> Using cache
---> 9bb8fff83697
Step 2 : WORKDIR /opt
---> Using cache
---> 3e3c96d96fc9
Step 3 : ADD docker.etag /tmp/docker.etag
---> Using cache
---> ac3b200c8cdc
Step 4 : RUN cat /tmp/docker.etag
---> Using cache
---> 4fafbeac2180
Step 5 : RUN curl -o docker https://get.docker.com/builds/Linux/x86_64/docker-latest
---> Using cache
---> 95aff324da85
Successfully built 95aff324da85
This strategy has been used to build Node.js, Java and other App servers or pre-built dependencies.
I use a similar but simpler approach:
Let's say I want to add a binary named mybin
that can be downloaded from: http://www.example.com/pub/mybin
I do the following in my Jenkins job
wget -N http://www.example.com/pub/mybin
And in my Docker File I have:
COPY mybin /usr/local/bin/
The option -N
downloads the binary only when it has changed on the server. The second time I run the wget
job I get:
...
Length: 12262118 (12M) [application/octet-stream]
Server file no newer than local file ‘mybin’ -- not retrieving.
And docker build
uses the cache.
If the binary changes on the server (when the time stamp changes), wget
downloads the binary again which invalidates the cache for the COPY command.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With