Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the use.cache feature of Packrat work?

Tags:

r

packrat

Packrat has a use.cache feature to reduce package installation time.

The documentation provides the following info:

use.cache: Install packages into a global cache, which is then shared across projects? The directory to use is read through Sys.getenv("R_PACKRAT_CACHE_DIR"). Not yet implemented for Windows. (logical; defaults to FALSE)

However, running install.package() doesn't grab readily installed packages from the users library.

How does use.cache work?

like image 839
wab Avatar asked Jun 21 '17 12:06

wab


1 Answers

Installing with Global Cache Enabled

Set up the cache with packrat using the following command:

#Optional to set location of cache:
#Sys.setenv(R_PACKRAT_CACHE_DIR = "/home/willbowditch/R/packratcache")

packrat::set_opts(use.cache=TRUE)

This is written to packrat.opts, which decides whether the cache is used when a project is opened in Rstudio.

auto.snapshot: TRUE
use.cache: TRUE
print.banner.on.startup: auto
vcs.ignore.lib: TRUE
vcs.ignore.src: FALSE
external.packages:
local.repos:
load.external.packages.on.startup: TRUE
ignored.packages:
quiet.package.installation: TRUE
snapshot.recommended.packages: FALSE
snapshot.fields:
    Imports
    Depends
    LinkingTo

Both base libs and installed libs get stored in the cache and symlinked:

./packrat/lib/x86_64-pc-linux-gnu/3.4.0:
total 2
drwxr-xr-x 2 willbowditch staff  4 Jun 14 16:21 .
drwxr-xr-x 3 willbowditch staff  3 Jun 14 16:20 ..
lrwxrwxrwx 1 willbowditch staff 99 Jun 14 16:21 CheckDigit -> /home/willbowditch/R/packratcache/v2/library/CheckDigit/0ab3083cafb11382646fdda41ddb8b98/CheckDigit
lrwxrwxrwx 1 willbowditch staff 93 Jun 14 16:21 packrat -> /home/willbowditch/R/packratcache/v2/library/packrat/6ad605ba7b4b476d84be6632393f5765/packrat

./packrat/lib-ext:
total 9
drwxr-xr-x 2 willbowditch staff 2 Jun 14 16:20 .
drwxr-xr-x 6 willbowditch staff 9 Jun 14 16:20 ..

./packrat/lib-R:
total 24
drwxr-xr-x 2 willbowditch staff 16 Jun 14 16:20 .
drwxr-xr-x 6 willbowditch staff  9 Jun 14 16:20 ..
lrwxrwxrwx 1 willbowditch staff 29 Jun 14 16:20 base -> /usr/local/lib/R/library/base
lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 compiler -> /usr/local/lib/R/library/compiler
lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 datasets -> /usr/local/lib/R/library/datasets
lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 graphics -> /usr/local/lib/R/library/graphics
lrwxrwxrwx 1 willbowditch staff 34 Jun 14 16:20 grDevices -> /usr/local/lib/R/library/grDevices
lrwxrwxrwx 1 willbowditch staff 29 Jun 14 16:20 grid -> /usr/local/lib/R/library/grid
lrwxrwxrwx 1 willbowditch staff 32 Jun 14 16:20 methods -> /usr/local/lib/R/library/methods
lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 parallel -> /usr/local/lib/R/library/parallel
lrwxrwxrwx 1 willbowditch staff 32 Jun 14 16:20 splines -> /usr/local/lib/R/library/splines
lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 stats -> /usr/local/lib/R/library/stats
lrwxrwxrwx 1 willbowditch staff 31 Jun 14 16:20 stats4 -> /usr/local/lib/R/library/stats4
lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 tcltk -> /usr/local/lib/R/library/tcltk
lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 tools -> /usr/local/lib/R/library/tools
lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 utils -> /usr/local/lib/R/library/utils

If you try and install a package it overwrites the symlink, rather than fetching the package from the cache, so it cannot be used to speed up the install of packages.

>install.packages('CheckDigit')
Installing package into ‘/home/willbowditch/packrattest/packrat/lib/x86_64-pc-linux-gnu/3.4.0’
(as ‘lib’ is unspecified)
trying URL 'https://mran.microsoft.com/snapshot/2017-06-07/src/contrib/CheckDigit_0.1-1.tar.gz'
Content type 'application/octet-stream' length 3777 bytes
==================================================
downloaded 3777 bytes

* installing *source* package ‘CheckDigit’ ...
** package ‘CheckDigit’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (CheckDigit)

The downloaded source packages are in
    ‘/tmp/RtmpxAU8pv/downloaded_packages’

But it does speed up the initiation of packrat projects that you are working on if the packages are require or library calls in the current directory. In this case packrat::init() or packrat::restore() restores the packages from the cache, but only if the packages have already been used in a cache enabled Packrat project before.

> packrat::init()
Initializing packrat project in directory:
- "~/six"
Fetching sources for BH (1.62.0-1) ... OK (CRAN current)
Fetching sources for DBI (0.6-1) ... OK (CRAN current)
Fetching sources for R6 (2.2.0) ... OK (CRAN current)
Fetching sources for Rcpp (0.12.10) ... OK (CRAN current)
Fetching sources for assertthat (0.2.0) ... OK (CRAN current)
Fetching sources for dplyr (0.5.0) ... OK (CRAN current)
Fetching sources for lazyeval (0.2.0) ... OK (CRAN current)
Fetching sources for magrittr (1.5) ... OK (CRAN current)
Fetching sources for packrat (0.4.8-1) ... OK (CRAN current)
Fetching sources for stringi (1.1.5) ... OK (CRAN current)
Fetching sources for tibble (1.3.0) ... OK (CRAN current)
Fetching sources for tidyr (0.6.2) ... OK (CRAN current)
Fetching sources for whisker (0.3-2) ... OK (CRAN current)
Snapshot written to '/home/willbowditch/six/packrat/packrat.lock'
Installing BH (1.62.0-1) ... 
    OK (symlinked cache)
Installing DBI (0.6-1) ... 
    OK (symlinked cache)
Installing R6 (2.2.0) ... 
    OK (symlinked cache)
Installing Rcpp (0.12.10) ... 
    OK (symlinked cache)
Installing assertthat (0.2.0) ... 
    OK (symlinked cache)
Installing lazyeval (0.2.0) ... 
    OK (symlinked cache)
Installing magrittr (1.5) ... 
    OK (symlinked cache)
Installing packrat (0.4.8-1) ... 
    OK (symlinked cache)
Installing stringi (1.1.5) ... 
    OK (symlinked cache)
Installing whisker (0.3-2) ... 
    OK (symlinked cache)
Installing tibble (1.3.0) ... 
    OK (symlinked cache)
Installing dplyr (0.5.0) ... 
    OK (symlinked cache)
Installing tidyr (0.6.2) ... 
    OK (symlinked cache)
Initialization complete!

In other words packages dont seem to go from global library to cache, but they can go from other packrat libraries to the cache.

Installing packages to a Packrat project from the users home (~) library quickly

As far as I can tell you can't use packages that haven't already been installed in packrat to shorten loading times with the cache option. This can be a problem when installing large packages, such as the tidyverse, from source (as you have to on Linux systems).

There are a couple of workarounds:

Workaround 1: Symlink your library

A straightforward workaround is to symlink the users package library to an empty packrat directory. Install time via this method is a few seconds and it doesn't seem to interfere with the process of creating a snapshot as long as packrat::clean() is run at the end of development.

Steps

New Project > using packrat

source('https://raw.githubusercontent.com/willbowditch/ratpack/master/R/ratpack.R')
symlink_packages()
#Develop as normal then run 
packrat::clean()
packrat::snapshot(ignore.stale=TRUE) 

Workaround 2: external.packages

Packrat does provide a workaround for large packages with the packrat::set_opts(external.packages=c('pkgname')) command, but packages installed in this way aren't included in the packrat/src folder.

In effect, the option symlinks the package directories to the packrat/lib-ext diretory.

I had a go at automating this, in the same way as the symlinking option - to grab all the users packages in their home directory and add them to the external.packages option.

Steps

New Project > using packrat

source('https://raw.githubusercontent.com/willbowditch/ratpack/master/R/ratpack.R')
   import_user_packages()
   #All installed packages will now be accessable within the packrat session

To reset at the end of development

   packrat::set_opts(external.packages=NULL)
   packrat::snapshot()
   packrat::restore() #This step will install the packages if they're not in the cache

The simplest option

Somewhere in between these options might make the most sense - users currate their list of large but commonly used packages to be symlinked (i.e.packrat::set_opts(external.packages=c('tidyverse', 'data.table')) ) and then put up with installing smaller packages on a project by project basis.

like image 118
wab Avatar answered Nov 02 '22 16:11

wab