Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using data.table package inside my own package

Tags:

r

data.table

I am trying to use the data.table package inside my own package. MWE is as follows:

I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code is

test.fun<-function () {     library(data.table)     testdata<-data.table(A=rep(seq(1,5), 5), Val=rnorm(25))     setkey(testdata, A)     res<-testdata[,{list(Ct=length(Val),Total=sum(Val),Avg=mean(Val))},"A"]     return(res) } 

When I create this function in a regular R session, and then run the function, it works as expected.

> res<-test.fun() data.table 1.8.0  For help type: help("data.table") > res      A Ct      Total        Avg [1,] 1  5 -0.5326444 -0.1065289 [2,] 2  5 -4.0832062 -0.8166412 [3,] 3  5  0.9458251  0.1891650 [4,] 4  5  2.0474791  0.4094958 [5,] 5  5  2.3609443  0.4721889 

When I put this function into a package, install the package, load the package, and then run the function, I get an error message.

> library(testpackage) > res<-test.fun() data.table 1.8.0  For help type: help("data.table") Error in `[.data.frame`(x, i, j) : object 'Val' not found 

Can anybody explain to me why this is happening and what I can do to fix it. Any help is very much appreciated.

like image 928
ruser Avatar asked May 10 '12 03:05

ruser


People also ask

What package is Data Table in?

Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.

How do I import data into an R package?

If you look at the package listing in the Packages panel, you will find a package called datasets. Simply check the checkbox next to the package name to load the package and gain access to the datasets. You can also click on the package name and RStudio will open a help file describing the datasets in this package.

What does the data table () function provide to big data processing?

It provides the efficient data. table object which is a much improved version of the default data. frame . It is super fast and has intuitive and terse syntax.


2 Answers

Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:

FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table-aware packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame is() a data.table, too.

This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :

CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.

like image 109
Matt Dowle Avatar answered Sep 28 '22 05:09

Matt Dowle


Here is the complete recipe:

  1. Add data.table to Imports in your DESCRIPTION file.

  2. Add @import data.table to your respective .R file (i.e., the .R file that houses your function that's throwing the error Error in [.data.frame(x, i, j) : object 'Val' not found).

  3. Type library(devtools) and set your working directory to point at the main directory of your R package.

  4. Type document(). This will ensure that your NAMESPACE file includes a import(data.table) line.

  5. Type build()

  6. Type install()

For a nice primer on what build() and install() do, see: http://kbroman.org/pkg_primer/.

Then, once you close your R session and login next time, you can immediately jump right in with:

  1. Type library("my_R_package")

  2. Type the name of your function that's housed in the .R file mentioned above.

  3. Enjoy! You should no longer receive the dreaded Error in [.data.frame(x, i, j) : object 'Val' not found

like image 23
warship Avatar answered Sep 28 '22 05:09

warship