Let's say I got a repository with multiple projects structured like this: <pre class="prettyprint"><code>Root ├── bar │ ├── Dockerfile │ └── index.js ├── baz │ ├── Dockerfile │ └── index.js ├── foo │ ├── Dockerfile │ └── index.js └── shared └── utils.js └── shared.js </code></pre> The <code>Foo</code>, <code>Bar</code> and <code>Baz</code> projects share some libraries in the <code>shared</code> folder. Currently, I'm sending the root folder as <code>context</code> to build these three Docker images to include the <code>shared</code> folder. To increase build time and reduce deployment time of my Docker images, I need to get the minimum size of <code>context</code> sent to these images. In order to do so, I plan on making a temporary folder for each images that will be used as <code>context</code>. Thing is, I need to know which shared files are used by each images. In this example, its quite simple because there is few shared files and few projects. But in reality, there are hundreds of shared files and about 20 projects, and I don't want to check which shared files are used by which projects. Here is an example of my <code>Dockerfile</code>: <pre class="prettyprint"><code>FROM node:boron RUN mkdir /app WORKDIR /app COPY package.json package.json RUN yarn COPY . . RUN yarn release CMD node release/server.js </code></pre> And I build the Docker image with: <pre class="prettyprint"><code>docker build -t foo:latest .. </code></pre> Note the <code>..</code> that point to the <code>Root</code> folder. This will result in all the shared files sent to the context, even those that are not needed. Is there an easy way to know which files of the sent <code>context</code> to Docker are used by it and which are not?

Before I begin, let me clear up a few misconceptions and define some terminology for users new and old. First off, docker images are more or less snapshots of a containers configurations. Everything from filesystems to network configurations are contained within an image and can be used to quickly create new instances (containers) of said image. Containers are running instances of a particular image and that is where all the magic happens. Docker containers can be viewed as tiny virtual machines, but unlike virtual machines, the system resources are unanimously shared and has a few other features that VM's do not readily have. You can get more information about this in another stack overflow post. Building an image is either done by saving a container (<code>docker commit *container* *repoTag*</code>) or by building from a Dockerfile which is automated build instructions as if you were to make changes to a container yourself. It also gives end users a running "Transaction" of all the commands needed to get your app running. <blockquote> To decrease build time ... of my Docker images </blockquote> Correct me if I am wrong, but it would seem that you are trying to build your image for each new container. Docker images are only needed to spin up a container. Yes, building them does take a while especially for dockerfiles, but once they are built, it really takes a trivial amount of time to spin up a container with your desired app which is really all you need. Again, docker images are save states of previous container configurations, and loading a save state does not and should not consume a lot of time, so you really shouldn't be concerned with a dockerfiles build time. ~~~~~~~~~~~~~~~~~~~~~~~~~~ Despite this, working to decrease a Dockerfiles build time and containers end file size is still a valid question and turning to automated dependency resolutions is a common approach. In fact, I asked a similar question nearly 2 years ago, so it may possess some information that can aid in this endeavor. However... <blockquote> To decrease build time and reduce deployment time of my Docker images, I need to get the minimum size of context sent to these images. </blockquote> To which Taco, a person who answered my earlier question, would have replied <blockquote> Docker isn't going to offer you painless builds. Docker doesn't know what you want. </blockquote> Yes, it certainly would be less of a hassle if Docker knew what you wanted from the get-go but the fact remains that you need to tell it exactly what you want if you are aiming for it to build with the best size and best time. However there is more than one way to obtain the best build time and/or build size. <ul> <li>One blatantly obvious one, as Andreas Wederbrand has mentioned in this very same post, that you could get the apps logs from a previous run to verify what it does or doesnt need. Suppose you did build one of your project apps by dumping all possible dependencies into it. You could systematically take out all dependencies, run the app, check for failure in its logs, add a dependency, check for output difference. If output is the same, remove failed dependency, otherwise keep the dependency. </li> </ul> If I wrote this particular command in a dockerfile it may go a little something like this, assuming the container is built from a linux system: <pre class="prettyprint"><code>#ASSUMING LINUX CONTAINER! ... WORKDIR path/to/place/project RUN mkdir dependencyTemp COPY path/to/project/and/dependencies/ . #Next part is written in pseudo code for the time being RUN move all dependencies to dependencyTemp \ && run app and store state and logs\ && while [$appState != running]; do {\ add dependency to folder && run app and store state and logs \ if [$logsOriginal == $logsNew]; then remove dependency from folder \ else keep dependency && logsOriginal = logsNew fi} </code></pre> This however is terribly inefficient as you are starting and stopping your application internally to find the dependencies needed for your app resulting in a terribly long build time. True, it would somewhat counter the issue of finding the dependencies yourself and reduce some size, but It may not work 100% of the time and its probably going to take less time for you to find what dependencies are needed to run your app as opposed to designing the code to escape that gap. <ul> <li>Another solution/alternative, albeit more complicated, is to <a href="https://docs.docker.com/engine/tutorials/networkingcontainers/" rel="nofollow noreferrer">link containers via networking</a>. Networking containers has remained a challenge for me, but its straightforward in what you would want it to accomplish with it. Say you spin up 3 containers, 2 of which are projects, the other a dependency container. Through the network, one container can reference the dependency container and obtain all needed dependencies similar to your current setup. Unlike yours however, the dependencies are not located on the app which means your other apps can be built with bare minimum size and time.</li> </ul> However, should the dependency container go down, then the other apps would go down as well which may not result in a stable system for the long run. Additionally you would have to stop and start the every container every time you needed to add a new dependency or project. <ul> <li>Lastly, if your containers are going to be kept locally you could look into volumes. Volumes are a nifty way of mounting file systems to active containers so applications within the containers can reference files that are not explicitly there. This translates to a more elegant docker build as all dependencies can legitimately be "shared" without having to be explicitly included.</li> </ul> Since its a live mount, you can add dependencies and files to update all your apps that need them simultaneously as an added bonus. However volumes do not work very well when looking to scale your projects beyond your local system and are subject to local tampering. ~~~~~~~~~~~~~~~~~~ The bottom line is docker can not auto-resolve dependencies for you and the workarounds for it are far too complicated and/or time consuming to even remotely consider possible for your desired solution since it would be much faster if you were figure out and specify the dependencies yourself. If you want to go out and develop the strategy yourself, go right ahead.

The only way to know if the application inside the docker images is using a specific file is to know the application or analyze it's logs from a previous run. I'll propose another way to solve your problem. It will reduce build time and image size but not necessarily deploy time. You'll build a base image for all your other images which contains the shared libraries. <pre class="prettyprint"><code>FROM node:boron COPY shared /shared </code></pre> And <pre class="prettyprint"><code>docker build -t erazihel/base:1.0 . </code></pre> You should base all the other images on that image <pre class="prettyprint"><code>FROM erazihel/base:1.0 RUN mkdir /app WORKDIR /app COPY package.json package.json RUN yarn RUN yarn release CMD node release/server.js </code></pre> Since docker images are layered the base image will only exist once on each deployment server and the additional layer each new docker image uses is very small. Build time should also decrease since there is no <code>COPY/ADD</code> done for the shared libraries. There isn't any cost in having one big base image since all the following images are much smaller. In fact, you'll likely save space.

List of used files in the Docker context

Tags:

docker

Let's say I got a repository with multiple projects structured like this:

Click to copy

Root
 ├── bar
 │   ├── Dockerfile
 │   └── index.js
 ├── baz
 │   ├── Dockerfile
 │   └── index.js
 ├── foo
 │   ├── Dockerfile
 │   └── index.js
 └── shared
     └── utils.js
     └── shared.js

The Foo, Bar and Baz projects share some libraries in the shared folder. Currently, I'm sending the root folder as context to build these three Docker images to include the shared folder.

To increase build time and reduce deployment time of my Docker images, I need to get the minimum size of context sent to these images.

In order to do so, I plan on making a temporary folder for each images that will be used as context. Thing is, I need to know which shared files are used by each images.

In this example, its quite simple because there is few shared files and few projects. But in reality, there are hundreds of shared files and about 20 projects, and I don't want to check which shared files are used by which projects.

Here is an example of my Dockerfile:

Click to copy

FROM node:boron

RUN mkdir /app
WORKDIR /app

COPY package.json package.json
RUN yarn

COPY . .

RUN yarn release

CMD node release/server.js

And I build the Docker image with:

Click to copy

docker build -t foo:latest ..

Note the .. that point to the Root folder. This will result in all the shared files sent to the context, even those that are not needed.

Is there an easy way to know which files of the sent context to Docker are used by it and which are not?

418

asked Jul 16 '17 17:07

Erazihel

4 Answers

Before I begin, let me clear up a few misconceptions and define some terminology for users new and old. First off, docker images are more or less snapshots of a containers configurations. Everything from filesystems to network configurations are contained within an image and can be used to quickly create new instances (containers) of said image.

Containers are running instances of a particular image and that is where all the magic happens. Docker containers can be viewed as tiny virtual machines, but unlike virtual machines, the system resources are unanimously shared and has a few other features that VM's do not readily have. You can get more information about this in another stack overflow post.

Building an image is either done by saving a container (docker commit *container* *repoTag*) or by building from a Dockerfile which is automated build instructions as if you were to make changes to a container yourself. It also gives end users a running "Transaction" of all the commands needed to get your app running.

To decrease build time ... of my Docker images

Correct me if I am wrong, but it would seem that you are trying to build your image for each new container. Docker images are only needed to spin up a container. Yes, building them does take a while especially for dockerfiles, but once they are built, it really takes a trivial amount of time to spin up a container with your desired app which is really all you need. Again, docker images are save states of previous container configurations, and loading a save state does not and should not consume a lot of time, so you really shouldn't be concerned with a dockerfiles build time.

~~~~~~~~~~~~~~~~~~~~~~~~~~

Despite this, working to decrease a Dockerfiles build time and containers end file size is still a valid question and turning to automated dependency resolutions is a common approach. In fact, I asked a similar question nearly 2 years ago, so it may possess some information that can aid in this endeavor. However...

To decrease build time and reduce deployment time of my Docker images, I need to get the minimum size of context sent to these images.

To which Taco, a person who answered my earlier question, would have replied

Docker isn't going to offer you painless builds. Docker doesn't know what you want.

Yes, it certainly would be less of a hassle if Docker knew what you wanted from the get-go but the fact remains that you need to tell it exactly what you want if you are aiming for it to build with the best size and best time. However there is more than one way to obtain the best build time and/or build size.

One blatantly obvious one, as Andreas Wederbrand has mentioned in
this very same post, that you could get the apps logs from a previous run to verify what it does or doesnt need. Suppose you did build one of your project apps by dumping all possible dependencies into it.
You could systematically take out all dependencies, run the app,
check for failure in its logs, add a dependency, check for output
difference. If output is the same, remove failed dependency,
otherwise keep the dependency.

If I wrote this particular command in a dockerfile it may go a little something like this, assuming the container is built from a linux system:

Click to copy

#ASSUMING LINUX CONTAINER!
...
WORKDIR path/to/place/project
RUN mkdir dependencyTemp
COPY path/to/project/and/dependencies/ .
#Next part is written in pseudo code for the time being
RUN move all dependencies to dependencyTemp \
   && run app and store state and logs\
   && while [$appState != running]; do {\
   add dependency to folder && run app and store state and logs \
   if [$logsOriginal == $logsNew]; then remove dependency from folder \
   else keep dependency && logsOriginal = logsNew fi}

This however is terribly inefficient as you are starting and stopping your application internally to find the dependencies needed for your app resulting in a terribly long build time. True, it would somewhat counter the issue of finding the dependencies yourself and reduce some size, but It may not work 100% of the time and its probably going to take less time for you to find what dependencies are needed to run your app as opposed to designing the code to escape that gap.

Another solution/alternative, albeit more complicated, is to link containers via networking. Networking containers has remained a challenge for me, but its straightforward in what you would want it to accomplish with it. Say you spin up 3 containers, 2 of which are projects, the other a dependency container. Through the network, one container can reference the dependency container and obtain all needed dependencies similar to your current setup. Unlike yours however, the dependencies are not located on the app which means your other apps can be built with bare minimum size and time.

However, should the dependency container go down, then the other apps would go down as well which may not result in a stable system for the long run. Additionally you would have to stop and start the every container every time you needed to add a new dependency or project.

Lastly, if your containers are going to be kept locally you could look into volumes. Volumes are a nifty way of mounting file systems to active containers so applications within the containers can reference files that are not explicitly there. This translates to a more elegant docker build as all dependencies can legitimately be "shared" without having to be explicitly included.

Since its a live mount, you can add dependencies and files to update all your apps that need them simultaneously as an added bonus. However volumes do not work very well when looking to scale your projects beyond your local system and are subject to local tampering.

~~~~~~~~~~~~~~~~~~

The bottom line is docker can not auto-resolve dependencies for you and the workarounds for it are far too complicated and/or time consuming to even remotely consider possible for your desired solution since it would be much faster if you were figure out and specify the dependencies yourself. If you want to go out and develop the strategy yourself, go right ahead.

147

answered Oct 14 '22 17:10

Jouster500

The only way to know if the application inside the docker images is using a specific file is to know the application or analyze it's logs from a previous run.

I'll propose another way to solve your problem. It will reduce build time and image size but not necessarily deploy time.

You'll build a base image for all your other images which contains the shared libraries.

Click to copy

FROM node:boron
COPY shared /shared

And

Click to copy

docker build -t erazihel/base:1.0 .

You should base all the other images on that image

Click to copy

FROM erazihel/base:1.0

RUN mkdir /app
WORKDIR /app

COPY package.json package.json
RUN yarn
RUN yarn release

CMD node release/server.js

Since docker images are layered the base image will only exist once on each deployment server and the additional layer each new docker image uses is very small. Build time should also decrease since there is no COPY/ADD done for the shared libraries.

There isn't any cost in having one big base image since all the following images are much smaller. In fact, you'll likely save space.

answered Oct 14 '22 18:10

Andreas Wederbrand

What you can do is use inotify. It's a kernel feature to sniff what is happening on the fly at filesystem level.

Should be something like this:

Use this script inotify.sh (don't forget chmod +x inotify.sh):

Click to copy

#!/bin/sh
DIRTOMONITOR=/src
apk add --update inotify-tools || $(apt-get update && apt-get install -y inotify-tools)
inotifywait -mr --timefmt '%H:%M' --format '%T %w %e %f' -e ACCESS $DIRTOMONITOR &
"$@"

Run your application, for example:

Click to copy

docker run \
  -v $(pwd)/inotify.sh:/inotify.sh \
  --entrypoint /inotify.sh \
  <your-image> \
  node server.js

Watches established.
12:34 /src/ ACCESS server.js             <---------
Server running at http://localhost:4000

Each read/written file will be shown as ACCESS.

answered Oct 14 '22 19:10

Robert

Build a base image with the shared files.
Build other images from that base image.

If your 'shared' files are not used in the child image, then the file doesn't belong in the shared folder.

By building a base image with the shared files, you can run the build for each image in its own folder / context without the issues you mention.

answered Oct 14 '22 19:10

MrE

Related questions
                            
                                Configure Dockerfile to use impdp command when the container is created
                            
                                How to use stack image container?
                            
                                How will a Docker application with ubuntu as base image work on Windows?
                            
                                Connecting go and postgres with docker-compose
                            
                                Installing npm dependencies inside docker and testing from volume
                            
                                .dockerignore all subfolders but not files in the folder itself
                            
                                Debug python application running in Docker
                            
                                "docker-machine rm" failing on non-existent EC2 instance
                            
                                Shell script: Remove hello world docker container without knowing ID
                            
                                Download docker logs from remote server
                            
                                Load kernel module from mac os sierra host to a docker container
                            
                                "kubectl exec" results in "error: unable to upgrade connection: Unauthorized"
                            
                                Docker in Jenkins pipeline fails to run due to invalid environment variable: =
                            
                                Docker Swarm discovery is still relevant?
                            
                                How to connect with MySQL DB running as container in docker? [duplicate]
                            
                                Passing ES_JAVA_OPTS variable with spaces when using docker compose
                            
                                Edit default .htaccess in wordpress docker
                            
                                Copying a Laravel's .env file into a Docker container
                            
                                How to build an Image using Docker API Python Client?
                            
                                Understanding Docker in Production

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

List of used files in the Docker context

Tags:

docker

Erazihel

People also ask

4 Answers

Jouster500

Andreas Wederbrand

Robert

MrE

Recent Activity

Donate For Us