Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a persistent location that is always writable which can be used as data cache by a package?

Tags:

package

r

Is there a predefined location where an R package could store cached data? The data should persist across sessions. I was thinking about creating a subdirectory of ${R_LIBS_USER}/package_name, but I'm not sure if this is portable and if this is "allowed" if my package is installed systemwide.

The idea is the following: Create an R script mydata.R in the data subdirectory of the package which would be executed by calling data(mydata) (according to the documentation of data()). This script would load the data from the internet and cache it, if it hasn't been cached before. (If the data has been cached already, the cache will be used.) In addition, a function will be provided to invalidate the cache and/or to check if a newer version of the data is available online.

This is from the documentation of data():

Currently, four formats of data files are supported:

  1. files ending ‘.R’ or ‘.r’ are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.)

  2. ...

Indeed, creating a file fortytwo.R in the data subdirectory of a package with the following contents:

fortytwo = data.frame(answer=42)

and then executing data(fortytwo) creates a data frame variable fortytwo. Now the question is: Where would fortytwo.R cache the data if it were difficult to compute?

EDIT: I am thinking about creating two packages: A "data" package that provides the data, and a "code" package that operates on it. The question concerns the "data" package: Where can it store files in a per-user storage so that it is persistent across R sessions and is accessible from different R projects?

Related: Package that downloads data from the internet during installation.

like image 244
krlmlr Avatar asked Feb 14 '13 12:02

krlmlr


People also ask

Where is cache stored?

The data in a cache is generally stored in fast access hardware such as RAM (Random-access memory) and may also be used in correlation with a software component. A cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer.

What is data cache?

Caching Data is a process that stores multiple copies of data or files in a temporary storage location—or cache—so they can be accessed faster.

What is caching and how it works?

Caching is the process of storing copies of files in a cache, or temporary storage location, so that they can be accessed more quickly. Technically, a cache is any temporary storage location for copies of files or data, but the term is often used in reference to Internet technologies.


1 Answers

There is no absolutely defined location for package-specific persistent caching in R. However, the R.cache package provides an interface for creating and managing cached data. It looks like it could be useful for your scenario.

When users load R.cache (library(R.cache)), they get the following prompt:

The R.cache package needs to create a directory that will hold cache files.
It is convenient to use one in the user's home directory, because it remains
also after restarting R. Do you wish to create the '~/.Rcache/' directory? If
not, a temporary directory (/tmp/RtmpqdUcbP/.Rcache) that is specific to this
R session will be used. [Y/n]:

They can then choose to create the cache directory in their home directory, which is presumably persistent, or to create a session-specific directory. If you make your data package depend on R.cache, you could check for the existence of the cached object(s) in its .onLoad() hook function and download the data if it isn't there. Alternatively, you could do this in the way suggested in your own question.

like image 165
Fabian Fagerholm Avatar answered Nov 07 '22 05:11

Fabian Fagerholm