Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load data object when package is loaded

Tags:

r

Is there a way to automatically load a data object from a package in memory when the package is loaded (but not yet attached)? I.e. the opposite of lazy loading? The object is used in one of the package functions, so it needs to be available at all time.

When the package is set to lazydata=false, the data object is not exported by the package at all, and needs to be loaded manually with data(). We could use something like:

.onLoad <- function(lib, pkg){
  data(mydata, package = pkg)
}

However, data() loads the object in the global environment. I strongly prefer to load it in the package environment (which is what lazydata does) to prevent masking conflicts.

A workaround is to bypass the data mechanics completely, and simply hardcode the object in the package. So the package myscore.R would look like

mymodel <- readRDS("inst/mymodel.rds")
myscore <- function(newdata){
  predict(mymodel, newdata)
}

But this will lead to a huge packagedb for large data objects, and I am not sure what are the consequences of that.

like image 996
Jeroen Ooms Avatar asked Jun 22 '14 19:06

Jeroen Ooms


2 Answers

As you say

The object is used in one of the package functions, so it needs to be available at all time.

I think the author of that package should really NOT use data(.) for that. Instead he should define the object inside his /R/ either by simple R code in an R/*.R file, or by using the sysdata.rda approach that is explained in the famous first reference for all these question, "Writing R Extensions". In both cases the package author can also export the object which is often desirable for other users as in your case.

Of course this needs a polite conversation between you and the package author, and will only apply to the next version of that package.

like image 58
Martin Mächler Avatar answered Nov 11 '22 01:11

Martin Mächler


I'm going to post this since it seems to work for my use case.

.onLoad() is:

 function(lib,pkg)
    data(mydata, package=pkg, 
           environment=parent.env(environment()))

Also need Imports: utils in DESCRIPTION and importFrom(utils, data) in NAMESPACE in order to pass R CMD check.

In my case I don't need the data object to be visible to the user, I need it to be visible to one of the functions in the package. If you need it visible to the user, that's going to be even harder (I think) because as far as I can tell you can't export data, just functions. The only way I've thought of to export data is to export a wrapper function for the data.

like image 39
Ben Bolker Avatar answered Nov 11 '22 02:11

Ben Bolker