I am building a package for internal use using devtools
. I would like to have the package load in data from a file/connection (that differs depending on the date package is built). The data is large-ish so having a onetime cost of parsing and loading the data during package building is preferable.
Currently, I have a data.R file under R/
that assigns the data to package-level variables, the values are assigned during package installation (or at least that's what appears to be happening). This less than ideal setup mostly works. In order to get all instances of the package to have the same data I have to distribute the data file with the package (currently it's being copied to inst/
by a helper script before building the package) instead of just having it all be packaged together. There must be a better way.
Such as:
data/
Collate
(I think) but then I have to maintain the order of all of the .R
files (but with that added complexity I might as well use a Makefile?)tl;dr: What are some methods for adding a snapshot of dynamically changing data to an R package frozen for deployment?
As @BenBolker points out in the comments above, splitting the dataset out into a different package has precedent in the community (most notably the core package datasets
) and has additional benefits.
The separation of functions from data also makes working on historic versions of the data easier to do with the up to date functions.
I currently have an tools-to-munge
package and a things-to-munge
package. Using a helper script I can build the tools-to-munge
and setup a Suggests
(or Depends
) in the DESCRIPTION
of both packages to point to the appropriate incrementing version of the packages. After the new tools-to-munge
package has been built I can build the things-to-munge
package as necessary using the functions in the tools-to-munge
package.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With