Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using data.table i and j arguments in functions

Tags:

I am trying to write some wrapper functions to reduce code duplication with data.table.

Here is an example using mtcars. First, set up some data:

library(data.table) data(mtcars) mtcars$car <- factor(gsub("(.*?) .*", "\\1", rownames(mtcars)), ordered=TRUE) mtcars <- data.table(mtcars) 

Now, here is what I would usually write to get a summary of counts by group. In this case I am grouping by car:

mtcars[, list(Total=length(mpg)), by="car"][order(car)]        car Total       AMC     1  Cadillac     1    Camaro     1 ...    Toyota     2   Valiant     1     Volvo     1 

The complication is that, since the arguments i and j are evaluated in the frame of the data.table, one has to use eval(...) if you want to pass in variables:

This works:

group <- "car" mtcars[, list(Total=length(mpg)), by=eval(group)] 

But now I want to order the results by the same grouping variable. I can't get any variant of the following to give me correct results. Notice how I always get a single row of results, rather than the ordered set.

mtcars[, list(Total=length(mpg)), by=eval(group)][order(group)]    car Total  Mazda     2 

I know why: it's because group is evaluated in the parent.frame, not the frame of the data.table.

How can I evaluate group in the context of the data.table?

More generally, how can I use this inside a function? I need the following function to give me all the results, not just the first row of data:

tableOrder <- function(x, group){   x[, list(Total=length(mpg)), by=eval(group)][order(group)] }  tableOrder(mtcars, "car") 
like image 421
Andrie Avatar asked Mar 14 '12 16:03

Andrie


People also ask

How do you use an argument in a function?

The terms parameter and argument can be used for the same thing: information that are passed into a function. From a function's perspective: A parameter is the variable listed inside the parentheses in the function definition. An argument is the value that are sent to the function when it is called.

Is data table DT == true?

data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.


1 Answers

Gavin and Josh are right. This answer is only to add more background. The idea is that not only can you pass variable column names into a function like that, but expressions of column names, using quote().

group = quote(car) mtcars[, list(Total=length(mpg)), by=group][order(group)]       group Total         AMC     1    Cadillac     1      ...      Toyota     2     Valiant     1       Volvo     1 

Although, admitedly more difficult to start with, it can be more flexible. That's the idea, anyway. Inside functions you need substitute(), like this :

tableOrder = function(x,.expr) {     .expr = substitute(.expr)     ans = x[,list(Total=length(mpg)),by=.expr]     setkeyv(ans, head(names(ans),-1))    # see below re feature request #1780     ans }  tableOrder(mtcars, car)       .expr Total         AMC     1    Cadillac     1      Camaro     1       ...      Toyota     2     Valiant     1       Volvo     1  tableOrder(mtcars, substring(car,1,1))  # an expression, not just a column name       .expr Total  [1,]     A     1  [2,]     C     3  [3,]     D     3  ...  [8,]     P     2  [9,]     T     2 [10,]     V     2  tableOrder(mtcars, list(cyl,gear%%2))   # by two expressions, so head(,-1) above      cyl gear Total [1,]   4    0     8 [2,]   4    1     3 [3,]   6    0     4 [4,]   6    1     3 [5,]   8    1    14 

A new argument keyby was added in v1.8.0 (July 2012) making it simpler :

tableOrder = function(x,.expr) {     .expr = substitute(.expr)     x[,list(Total=length(mpg)),keyby=.expr] } 

Comments and feedback in the area of i,j and by variable expressions are most welcome. The other thing you can do is have a table where a column contains expressions and then look up which expression to put in i, j or by from that table.

like image 59
Matt Dowle Avatar answered Sep 28 '22 03:09

Matt Dowle