I use docker in both development and production and one thing which really bugs me is docker cache simplicity. I have ruby application which requires <code>bundle install</code> to install dependencies so I start with the following Dockerfile: <code> ADD Gemfile Gemfile ADD Gemfile.lock Gemfile.lock RUN bundle install --path /root/bundle </code> All dependencies are cached and it works great until I add a new gem. Even if gem I have added is just 0.5 MB it still takes 10-15 minutes to install all application gems from scratch. And then another 10 minutes to deploy it due to the size of dependencies folder ( about 300MB). I have encountered exactly the same problem with node_modules and npm. I was wondering, did anyone found solution for this problem? My research results so far: <ul> <li>Source to image - caches arbitrary files across incremental builds. Unfortunately, due to the way it works it requires to push the whole 300MB to a registry even when gems are not changed. Faster build -> slower deploy even when gems are not updated.</li> <li>Gemfile.tip - split Gemfile into two different files and only add gems to one of them. Very specific solution to bundler and I am not convinced that it is gonna scale beyond adding 1-2 gems. </li> <li>Harpoon - would be a good fit if not the fact that they force ditching of Dockerfile and switch to they own format. Which means extra pain for all new devs in a team as this toolset requires time to learn separately from docker.</li> <li>Temporarily package cache. That is just an idea I had not sure is it possible. Somehow bring package manager cache ( not the dependencies folder ) to the machine before installing packages and then remove it. Based on my hack it significantly speedups package installation for both bundler and npm without bloating the machine with unnecessary cache files.</li> </ul>

I found two possible solutions that use external data volume for gem storage: one and two. Briefly, <ul> <li>you specify an image that is used to store gems only</li> <li>in your app images, in <code>docker-compose.yml</code> you specify the mount point for <code>BUNDLE_PATH</code> via <code>volumes_from</code>.</li> <li>when your app container starts up, it executes <code>bundle check || bundle install</code> and things are good to go.</li> </ul> This is one possible solution, however to me it feels like it goes slightly against the docker way. Specifically, <code>bundle install</code> to me sounds like it should be part of the build process and shouldn't be part of the runtime. Other things, that depend on the <code>bundle install</code> like <code>asset:precompile</code> are now a runtime task as well. This is a vaiable solution but I'm looking forward to something a little more robust.

I cache the gems to a tar file in the application tmp directory. Then I copy the gems into a layer using the <code>ADD</code> command before doing the bundle install. From my <code>Dockerfile.yml</code>: <pre class="prettyprint"><code>WORKDIR /home/app # restore the gem cache. This only runs when # gemcache.tar.bz2 changes, so usually it takes # no time ADD tmp/gemcache.tar.bz2 /var/lib/gems/ COPY Gemfile /home/app/Gemfile COPY Gemfile.lock /home/app/Gemfile.lock RUN gem update --system && \ gem update bundler && \ bundle install --jobs 4 --retry 5 </code></pre> Be sure you are sending the gem cache to your docker machine. My gemcache is 118MB, but since I am building locally it copies fast. My <code>.dockerignore</code>: <pre class="prettyprint"><code>tmp !tmp/gemcache.tar.bz2 </code></pre> You need to cache the gems from a built image, but initially you may not have an image. Create an empty cache like so (I have this in a rake task): <pre class="prettyprint"><code>task :clear_cache do sh "tar -jcf tmp/gemcache.tar.bz2 -T /dev/null" end </code></pre> After the image is built copy the gems to the gem cache. My image is tagged <code>app</code>. I create a docker container from the image, copy <code>/var/lib/gems/2.2.0</code> into my gemcache using the <code>docker cp</code> command, and then delete the container. Here's my rake task: <pre class="prettyprint"><code>task :cache_gems do id = `docker create app`.strip begin sh "docker cp #{id}:/var/lib/gems/2.2.0/ - | bzip2 > tmp/gemcache.tar.bz2" ensure sh "docker rm -v #{id}" end end </code></pre> On the subsequent image build the gemcache is copied to a layer before the <code>bundle install</code> is called. This takes some time, but it is faster than a <code>bundle install</code> from scratch. Builds after that are even faster because the docker has cached the <code>ADD tmp/gemcache.tar.bz2 /var/lib/gems/</code> layer. If there are any changes to <code>Gemfile.lock</code> only those changes are built. There is no reason to rebuild the gem cache on each <code>Gemfile.lock</code> change. Once there are enough differences between the cache and the <code>Gemfile.lock</code> that a <code>bundle install</code> is slow you can rebuild the gem cache. When I do want to rebuild the gem cache it is a simple <code>rake cache_gems</code> command.

Docker bundle install cache issues when updating gems

Tags:

npm

docker

dockerfile

ruby

bundler

I use docker in both development and production and one thing which really bugs me is docker cache simplicity. I have ruby application which requires bundle install to install dependencies so I start with the following Dockerfile: ADD Gemfile Gemfile ADD Gemfile.lock Gemfile.lock RUN bundle install --path /root/bundle All dependencies are cached and it works great until I add a new gem. Even if gem I have added is just 0.5 MB it still takes 10-15 minutes to install all application gems from scratch. And then another 10 minutes to deploy it due to the size of dependencies folder ( about 300MB).

I have encountered exactly the same problem with node_modules and npm. I was wondering, did anyone found solution for this problem?

My research results so far:

Source to image - caches arbitrary files across incremental builds. Unfortunately, due to the way it works it requires to push the whole 300MB to a registry even when gems are not changed. Faster build -> slower deploy even when gems are not updated.
Gemfile.tip - split Gemfile into two different files and only add gems to one of them. Very specific solution to bundler and I am not convinced that it is gonna scale beyond adding 1-2 gems.
Harpoon - would be a good fit if not the fact that they force ditching of Dockerfile and switch to they own format. Which means extra pain for all new devs in a team as this toolset requires time to learn separately from docker.
Temporarily package cache. That is just an idea I had not sure is it possible. Somehow bring package manager cache ( not the dependencies folder ) to the machine before installing packages and then remove it. Based on my hack it significantly speedups package installation for both bundler and npm without bloating the machine with unnecessary cache files.

748

asked Jun 25 '15 01:06

mfilimonov

3 Answers

I found two possible solutions that use external data volume for gem storage: one and two.

Briefly,

you specify an image that is used to store gems only
in your app images, in docker-compose.yml you specify the mount point for BUNDLE_PATH via volumes_from.
when your app container starts up, it executes bundle check || bundle install and things are good to go.

This is one possible solution, however to me it feels like it goes slightly against the docker way. Specifically, bundle install to me sounds like it should be part of the build process and shouldn't be part of the runtime. Other things, that depend on the bundle install like asset:precompile are now a runtime task as well.

This is a vaiable solution but I'm looking forward to something a little more robust.

176

answered Dec 02 '22 06:12

EightyEight

I cache the gems to a tar file in the application tmp directory. Then I copy the gems into a layer using the ADD command before doing the bundle install. From my Dockerfile.yml:

WORKDIR /home/app

# restore the gem cache. This only runs when
# gemcache.tar.bz2 changes, so usually it takes
# no time
ADD tmp/gemcache.tar.bz2 /var/lib/gems/

COPY Gemfile /home/app/Gemfile
COPY Gemfile.lock /home/app/Gemfile.lock
RUN gem update --system && \
gem update bundler && \
bundle install --jobs 4 --retry 5

Be sure you are sending the gem cache to your docker machine. My gemcache is 118MB, but since I am building locally it copies fast. My .dockerignore:

tmp
!tmp/gemcache.tar.bz2

You need to cache the gems from a built image, but initially you may not have an image. Create an empty cache like so (I have this in a rake task):

task :clear_cache do
  sh "tar -jcf tmp/gemcache.tar.bz2 -T /dev/null"
end

After the image is built copy the gems to the gem cache. My image is tagged app. I create a docker container from the image, copy /var/lib/gems/2.2.0 into my gemcache using the docker cp command, and then delete the container. Here's my rake task:

task :cache_gems do
  id = `docker create app`.strip
  begin
    sh "docker cp #{id}:/var/lib/gems/2.2.0/ - | bzip2 > tmp/gemcache.tar.bz2"
  ensure
    sh "docker rm -v #{id}"
  end
end

On the subsequent image build the gemcache is copied to a layer before the bundle install is called. This takes some time, but it is faster than a bundle install from scratch.

Builds after that are even faster because the docker has cached the ADD tmp/gemcache.tar.bz2 /var/lib/gems/ layer. If there are any changes to Gemfile.lock only those changes are built.

There is no reason to rebuild the gem cache on each Gemfile.lock change. Once there are enough differences between the cache and the Gemfile.lock that a bundle install is slow you can rebuild the gem cache. When I do want to rebuild the gem cache it is a simple rake cache_gems command.

answered Dec 02 '22 04:12

Scott Jacobsen

The "copy local dependencies" approach (accepted answer) is a bad idea IMO. The whole point of dockerizing your environment is to have an isolated, reproducible environment.

Here's how we are doing it.

# .docker/docker-compose.dev.yml
version: '3.7'
services:

  web:
    build: .
    command: 'bash -c "wait-for-it cache:1337 && bin/rails server"'
    depends_on:
      - cache
    volumes:
      - cache:/bundle
    environment:
      BUNDLE_PATH: '/bundle'

  cache:
    build:
      context: ../
      dockerfile: .docker/cache.Dockerfile
    volumes:
      - bundle:/bundle
    environment:
      BUNDLE_PATH: '/bundle'
    ports:
      - "1337:1337"

volumes:
  cache:

# .docker/cache.Dockerfile
FROM ruby:2.6.3
RUN apt-get update -qq && apt-get install -y netcat-openbsd
COPY Gemfile* ./
COPY .docker/cache-entrypoint.sh ./
RUN chmod +x cache-entrypoint.sh
ENTRYPOINT ./cache-entrypoint.sh

# .docker/cache-entrypoint.sh
#!/bin/bash

bundle check || bundle install
nc -l -k -p 1337

# web.dev.Dockerfile
FROM ruby:2.6.3
RUN apt-get update -qq && apt-get install -y nodejs wait-for-it
WORKDIR ${GITHUB_WORKSPACE:-/app}
# Note: bundle install step removed
COPY . ./

This is similar to the concept explained by @EightyEight but it doesn't put bundle install into the main service's startup, instead, the update is managed by a different service. Either way, don't use this approach in production. Running services without their dependencies being installed in the build step will at the very least cause more downtime than necessary.

answered Dec 02 '22 04:12

thisismydesign

Related questions
                            
                                "No Gemfile found" error in Run configurations for IntelliJ + Ruby plugin
                            
                                Heroku Error H13
                            
                                Difference between inline if conditional and block if conditional in Ruby
                            
                                Rails+ActiveAdmin - filtering with ransacker throws an error PG::SyntaxError: ERROR: syntax error at or near ","
                            
                                Sprockets::FileNotFound - couldn't find file 'jquery.ui'
                            
                                How to timeout gets.chomp
                            
                                Specify unique attributes for child models in rails using STI
                            
                                Mashalling in Ruby 2.2.0 slower than 2.1.5
                            
                                Ruby regex ‘backslash R’ aka ‘\R’ pattern
                            
                                Middleman - Unknown Extension: livereload
                            
                                Difference between config.authentication_keys and config.request_keys in devise
                            
                                Odd rake db:migrate output
                            
                                Cocoapods no integrate in Pod file
                            
                                Rubocop only to check modified lines
                            
                                Sass: errno::enoent: No such file or directory
                            
                                Error installing Ruby 2.2.2 with RVM on Ubuntu 14.04
                            
                                remove quotes from returned string of grape api
                            
                                Ruby: Connect to remote WebSocket
                            
                                Sequence within trait of FactoryGirl factory does not use main sequence counter
                            
                                DatabaseCleaner.clean_with(:truncate) does not reset auto incremented id

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With