Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up R packages installation in docker

Say you have the following list of packages you would like to install for a docker image

("jsonlite","dplyr","stringr","tidyr","lubridate",
"knitr","purrr","tm","cba","caret",
"plumber","httr")

It actually takes around 1 hour to install these!

Any suggestions into how to speed up such a thing ? (or how to prevent the re-installation at every new image build ?)

Side note

I do not install these packages from the dockerfile like this:

RUN Rscript -e "install.packages('stringr')
...

Instead I create an R script Requirements.R which installs these packages and simply do:

RUN Rscript Requirements.R

Is these less optimal than installing the packages directly from the Dockerfile ?

like image 879
AnarKi Avatar asked Jul 24 '18 13:07

AnarKi


People also ask

How can I speed up my Docker image build?

Sample intermediate BuildKit output. For the sake of faster image builds, the new cache-mount feature can help you to cache downloaded packages inbetween image rebuilds, even if your dependencies change and the layer needs to be rebuilt. I hope you can use those tips to speed up your Docker image build.

What is Docker for R package development?

If you’re here, I presume you have some interest in R package development and/or using Docker, which is a tool for containerizing an environment for running software. So why another blogpost about it?

Should you use Docker in your dev environment?

Using Docker doesn’t have to be all-or-nothing. You can choose to use Docker for deployment, building self-contained images to production, but you don’t have to use it in development. Skip using Docker in your dev environment, or… Only run backing services in containers, or… Provide an example dev environment Dockerfile, or…

How do Docker layers work?

Each instruction in your Dockerfile results in an image layer being created. Docker uses layers to reuse work, and save bandwidth. Layers are cached and don’t need to be recomputed if: All previous layers are unchanged. In case of COPY instructions: the files/folders are unchanged. In case of all other instructions: the command text is unchanged.


3 Answers

Use binary packages where you can as we often do in the Rocker Project providing multiple Docker files for R, including the official r-base one.

If you start from Ubuntu, you get Michael's PPAs with over 3000+ packages; if you start from Debian you get fewer from the distro but still many essential ones. (There are some efforts to bring more binary packages to Debian but nothing is up right now.)

Lastly, Dockerfile creation is of course compile time too. You spend the time once (per container creation) and re-use potentially many time after. Also, by using the Docker Hub you can avoid spending your local cpu cycles.

Edit in Sep 2020: The (updated) Ubuntu PPA now has over 4600 package for the three most recent LTS releases. Still highly, highly recommended.

like image 113
Dirk Eddelbuettel Avatar answered Oct 17 '22 20:10

Dirk Eddelbuettel


I found an article that described how to install R packages from precompiled binaries. It reduced the build time on our Jenkins server from 45 minutes down to 3 minutes.

Here is my Dockerfile:

FROM rocker/r-apt:bionic
WORKDIR /app
RUN apt-get update && \
  apt-get install -y libxml2-dev

# Install binaries (see https://datawookie.netlify.com/blog/2019/01/docker-images-for-r-r-base-versus-r-apt/)
COPY ./requirements-bin.txt .
RUN cat requirements-bin.txt | xargs apt-get install -y -qq

# Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R

# Clean up package registry
RUN rm -rf /var/lib/apt/lists/*

COPY ./src /app

EXPOSE 5000
CMD ["Rscript", "Server.R"]

You can add a file requirements-bin.txt with package names:

r-cran-plumber
r-cran-quanteda
r-cran-irlba
r-cran-lsa
r-cran-caret
r-cran-stringr
r-cran-dplyr
r-cran-magrittr
r-cran-randomforest

And finally, a requirements-src.R for packages that are not available as binairies:

pkgs <- c(
    'otherpackage'
)

install.packages(pkgs)
like image 23
Jodiug Avatar answered Oct 17 '22 20:10

Jodiug


I ended up using rocker/r-base as @DirkEddelbuettel suggested. Also thanks to this How to avoid reinstalling packages when building Docker image for Python projects? I wrote my Dockerfile in a way that doesen't reinstall packages every time I rebuild my docker image.

I want to share how my Dockerfile looks like now, hopefully this will be of help to others:

FROM rocker/r-base

RUN apt-get update

# install packages 
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get -y install libssl-dev

# set work directory 
WORKDIR /myapp

# copy requirments R script
COPY ./Requirements.R /myapp/Requirements.R

# run requirments R script
RUN Rscript Requirements.R

COPY . /myapp

EXPOSE 8094

ENV NAME R-test-service

CMD ["Rscript", "my_R_api.R"]
like image 43
AnarKi Avatar answered Oct 17 '22 21:10

AnarKi