Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save a VERY LARGE .rda file in R package

I am eager to save two 460 x 5000 numeric matrices into my R-package. Following the instructions in: How to effectively deal with uncompressed saves during package check? I saved the objects as:

save(mat1,file="mat1.rda",compress="xz")
save(mat2,file="mat2.rda",compress="xz")

However, the resulting R-objects are quite large (8.7MB and 8.9 MB) and the R CMD CHECK --as-cran gives me the notes:

 * checking installed package size ... NOTE
   installed size is 20.1Mb
   sub-directories of 1Mb or more:
   data  20.0Mb

In my understanding, one cannot submit R packages to CRAN which does not "pass" (i.e., no Note nor warning) R CMD CHECL --as-cran. Is there way to compress the dataset even smaller?

like image 234
FairyOnIce Avatar asked Apr 22 '14 07:04

FairyOnIce


People also ask

How do I save data in R?

The functions save (), load (), and the R file type .rda. The .rda files allow a user to save their R data structures such as vectors, matrices, and data frames. The file is automatically compressed, with user options for additional compression.

How to load a RDA file into the current R environment?

If we look at our current environment in RStudio, we’ll see that it doesn’t contain any objects: We can then use the load () function to load the .rda file into the current R environment: If we look at the current environment again in RStudio, we’ll see that it now contains the data frame:

How do I save a data frame to a RDA file?

We can use the save () function to save this data frame to an .rda file: This file will automatically be saved in the current working directory. You can find the working directory by using the getwd () function: #display working directory getwd () "C:/Users/Bob/Documents"

What is a RDA file?

The .rda files allow a user to save their R data structures such as vectors, matrices, and data frames. The file is automatically compressed, with user options for additional compression. Let’s take a look.


2 Answers

Is it really necessary to include those files? I see several options:

  • Include a smaller subset of the matrix, which you use in the examples.
  • Generate a matrix on-the-fly, e.g. with random numbers.
  • Put the files somewhere for download, and ensure that the examples do not execute.
like image 137
Paul Hiemstra Avatar answered Sep 23 '22 21:09

Paul Hiemstra


Consider distributing the data in a separate data package that will be built, uploaded and installed only once (hopefully). Compare this to the efforts required to retransfer the same data over and over again as you update your package.

(Of course, this applies only if you intend to supply updates to your package. There's no difference if your code is perfect right from the start ;-) )

like image 41
krlmlr Avatar answered Sep 20 '22 21:09

krlmlr