Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude data sets from R package build

I'm implementing an R package, where I have several big .rda data files in the 'data' folder.

When I build the package (with R CMD build to create the .tar.gz packed file), also the data files are included in the package, and since they are really big, this makes the build (as well the check) process very slow, and the final package size uselessly big.

These data are downloaded from some DB through a function of the package, so the intent is not to include the data in the package, but to let the user populates the data folder from its own DB. The data that I use are for test, and it makes no sense to include them into the package.

Summarizing my question is: is it possible to keep the data in the 'data' folder, but exclude them from the built package?

Edit

Ok, I found a first solution by creating a file named .Rbuildignore that contains a line:

^data/.+$

anyway the problem remains for the R CMD install and R CMD check processes, that do not take into account the .Rbuildignore file.

Any suggestion to exclude a folder also from the install/check processes?

like image 303
WoDoSc Avatar asked Apr 30 '14 07:04

WoDoSc


1 Answers

If you use .Rbuildignore you should first build then check your package (it's not a check-ignore). Here a few tests in a Debian environment and a random package:

l@np350v5c:~/src/yapomif/pkg$ ls
data  DESCRIPTION  man  NAMESPACE  R

l@np350v5c:~/src/yapomif/pkg$ R
> save(Formaldehyde, file = "data/formal.rda")

l@np350v5c:~/src/yapomif/pkg$ ls -l
totale 20
drwxr-xr-x 2 l l 4096 mag  1 01:31 data
-rw-r--r-- 1 l l  349 apr 25 00:35 DESCRIPTION
drwxr-xr-x 2 l l 4096 apr 25 01:10 man
-rw-r--r-- 1 l l 1189 apr 25 00:33 NAMESPACE
drwxr-xr-x 2 l l 4096 apr 25 01:02 R

l@np350v5c:~/src/yapomif/pkg$ ls -l data/
totale 4
-rw-r--r-- 1 l l 229 mag  1 01:31 formal.rda

Now i create exactly your .Rbuildignore

l@np350v5c:~/src/yapomif/pkg$ em .Rbuildignore
l@np350v5c:~/src/yapomif/pkg$ cat .Rbuildignore
^data/.+$

Ok let's build

l@np350v5c:~/src/yapomif/pkg$ cd ..
l@np350v5c:~/src/yapomif$ R CMD build pkg
> tools:::.build_packages()
* checking for file ‘pkg/DESCRIPTION’ ... OK
* preparing ‘yapomif’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
Removed empty directory ‘yapomif/data’
* building ‘yapomif_0.8.tar.gz’

Fine (you see the message about yapomif/data). Now check the package

l@np350v5c:~/src/yapomif$ R CMD check yapomif_0.8.tar.gz
> tools:::.check_packages()
* using log directory ‘/home/l/.src/yapomif/yapomif.Rcheck’
* using R version 3.1.0 (2014-04-10)
* using platform: x86_64-pc-linux-gnu (64-bit)
...

... everything as usual

Now let's check the file (moved to home directory to keep my development dir clean)

l@np350v5c:~/src/yapomif$ mv yapomif_0.8.tar.gz ~
l@np350v5c:~/src/yapomif$ cd
l@np350v5c:~$ tar xvzf yapomif_0.8.tar.gz
l@np350v5c:~$ ls yapomif
DESCRIPTION  man  NAMESPACE  R

so there is no data directory

BUT if

l@np350v5c:~/src/yapomif$ R CMD check pkg

...

Undocumented data sets:
  ‘Formaldehyde’

So, as stated, first build, then check.

HTH, Luca

like image 179
Luca Braglia Avatar answered Oct 17 '22 12:10

Luca Braglia