Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making a package in R that depends on data.table

Tags:

r

data.table

I have to make an R package that depends on the package data.table. However, if I would do a function such as the next one in the package

randomdt <- function(){
    dt <- data.table(random = rnorm(10))
    dt[dt$random > 0]
}

the function [ will use the method for data.frame not for data.table and therefore the error

Error in `[.data.frame`(x, i) : undefined columns selected

will appear. Usually this would be solved by using get('[.data.table') or similar method (package::function is the simplest) but that appears not to work. After all, [ is a primitive function and I don't know how the methods to it work.

So, how can I call the data.table [ function from my package?

like image 655
Usobi Avatar asked Oct 09 '15 13:10

Usobi


People also ask

How do I create a custom package in R?

Once RTools is set up, you can go about creating the package. In RStudio, select File > New Project > New Directory > R Package. In the dialog box that pops up, give the package a name and enter the directory in which you want the package to reside.

Is data.table package in R?

data. table is an R package that provides a high-performance version of base R's data. frame with syntax and feature enhancements for ease of use, convenience and programming speed.

How do I use a dataset in a package in R?

If you look at the package listing in the Packages panel, you will find a package called datasets. Simply check the checkbox next to the package name to load the package and gain access to the datasets. You can also click on the package name and RStudio will open a help file describing the datasets in this package.

Is data.table faster than dplyr?

While dplyr has very flexible and intuitive syntax, data. table can be orders of magnitude faster in some scenarios. One of those scenarios is when performing operations over a very large number of groups.

How do you access data from a table in R?

To access the table values, we can use single square brackets. For example, if we have a table called TABLE then the first element of the table can accessed by using TABLE[1].


1 Answers

Updated based on some feedback from MichaelChirico and comments by Arun and Soheil.

Roughly speaking, there's two approaches you might consider. The first is building the dependency into your package itself, while the second is including lines in your R code that test for the presence of data.table (and possibly even install it automatically if it is not found).

The data.table FAQ specifically addresses this in 6.9, and states that you can ensure that data.table is appropriately loaded by your package by:

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

As noted in the comments, this is common R behavior that is in numerous packages.

An alternative approach is to create specific lines of code which test for and import the required packages as part of your code. This is, I would contend, not the ideal solution given the elegance of using the option provided above. However, it is technically possible.

A simple way of doing this would be to use either require or library to check for the existence of data.table, with an error thrown if it could not be attached. You could even use a simple set of conditional statements to run install.packages to install what you need if loading them fails.

Yihui Xie (of knitr fame) has a great post about the difference between library and require here and makes a strong case for just using library in cases where the package is absolutely essential for the upcoming code.

like image 108
TARehman Avatar answered Nov 06 '22 23:11

TARehman