Say you have the following list of packages you would like to install for a docker image
("jsonlite","dplyr","stringr","tidyr","lubridate",
"knitr","purrr","tm","cba","caret",
"plumber","httr")
It actually takes around 1 hour to install these!
Any suggestions into how to speed up such a thing ? (or how to prevent the re-installation at every new image build ?)
Side note
I do not install these packages from the dockerfile like this:
RUN Rscript -e "install.packages('stringr')
...
Instead I create an R script Requirements.R
which installs these packages and simply do:
RUN Rscript Requirements.R
Is these less optimal than installing the packages directly from the Dockerfile ?
Sample intermediate BuildKit output. For the sake of faster image builds, the new cache-mount feature can help you to cache downloaded packages inbetween image rebuilds, even if your dependencies change and the layer needs to be rebuilt. I hope you can use those tips to speed up your Docker image build.
If you’re here, I presume you have some interest in R package development and/or using Docker, which is a tool for containerizing an environment for running software. So why another blogpost about it?
Using Docker doesn’t have to be all-or-nothing. You can choose to use Docker for deployment, building self-contained images to production, but you don’t have to use it in development. Skip using Docker in your dev environment, or… Only run backing services in containers, or… Provide an example dev environment Dockerfile, or…
Each instruction in your Dockerfile results in an image layer being created. Docker uses layers to reuse work, and save bandwidth. Layers are cached and don’t need to be recomputed if: All previous layers are unchanged. In case of COPY instructions: the files/folders are unchanged. In case of all other instructions: the command text is unchanged.
Use binary packages where you can as we often do in the Rocker Project providing multiple Docker files for R, including the official r-base one.
If you start from Ubuntu, you get Michael's PPAs with over 3000+ packages; if you start from Debian you get fewer from the distro but still many essential ones. (There are some efforts to bring more binary packages to Debian but nothing is up right now.)
Lastly, Dockerfile creation is of course compile time too. You spend the time once (per container creation) and re-use potentially many time after. Also, by using the Docker Hub you can avoid spending your local cpu cycles.
Edit in Sep 2020: The (updated) Ubuntu PPA now has over 4600 package for the three most recent LTS releases. Still highly, highly recommended.
I found an article that described how to install R packages from precompiled binaries. It reduced the build time on our Jenkins server from 45 minutes down to 3 minutes.
Here is my Dockerfile
:
FROM rocker/r-apt:bionic
WORKDIR /app
RUN apt-get update && \
apt-get install -y libxml2-dev
# Install binaries (see https://datawookie.netlify.com/blog/2019/01/docker-images-for-r-r-base-versus-r-apt/)
COPY ./requirements-bin.txt .
RUN cat requirements-bin.txt | xargs apt-get install -y -qq
# Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R
# Clean up package registry
RUN rm -rf /var/lib/apt/lists/*
COPY ./src /app
EXPOSE 5000
CMD ["Rscript", "Server.R"]
You can add a file requirements-bin.txt
with package names:
r-cran-plumber
r-cran-quanteda
r-cran-irlba
r-cran-lsa
r-cran-caret
r-cran-stringr
r-cran-dplyr
r-cran-magrittr
r-cran-randomforest
And finally, a requirements-src.R
for packages that are not available as binairies:
pkgs <- c(
'otherpackage'
)
install.packages(pkgs)
I ended up using rocker/r-base as @DirkEddelbuettel suggested. Also thanks to this How to avoid reinstalling packages when building Docker image for Python projects? I wrote my Dockerfile in a way that doesen't reinstall packages every time I rebuild my docker image.
I want to share how my Dockerfile looks like now, hopefully this will be of help to others:
FROM rocker/r-base
RUN apt-get update
# install packages
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get -y install libssl-dev
# set work directory
WORKDIR /myapp
# copy requirments R script
COPY ./Requirements.R /myapp/Requirements.R
# run requirments R script
RUN Rscript Requirements.R
COPY . /myapp
EXPOSE 8094
ENV NAME R-test-service
CMD ["Rscript", "my_R_api.R"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With