I am trying to write some wrapper functions to reduce code duplication with data.table
.
Here is an example using mtcars
. First, set up some data:
library(data.table) data(mtcars) mtcars$car <- factor(gsub("(.*?) .*", "\\1", rownames(mtcars)), ordered=TRUE) mtcars <- data.table(mtcars)
Now, here is what I would usually write to get a summary of counts by group. In this case I am grouping by car
:
mtcars[, list(Total=length(mpg)), by="car"][order(car)] car Total AMC 1 Cadillac 1 Camaro 1 ... Toyota 2 Valiant 1 Volvo 1
The complication is that, since the arguments i
and j
are evaluated in the frame of the data.table
, one has to use eval(...)
if you want to pass in variables:
This works:
group <- "car" mtcars[, list(Total=length(mpg)), by=eval(group)]
But now I want to order the results by the same grouping variable. I can't get any variant of the following to give me correct results. Notice how I always get a single row of results, rather than the ordered set.
mtcars[, list(Total=length(mpg)), by=eval(group)][order(group)] car Total Mazda 2
I know why: it's because group
is evaluated in the parent.frame
, not the frame of the data.table
.
How can I evaluate group
in the context of the data.table
?
More generally, how can I use this inside a function? I need the following function to give me all the results, not just the first row of data:
tableOrder <- function(x, group){ x[, list(Total=length(mpg)), by=eval(group)][order(group)] } tableOrder(mtcars, "car")
The terms parameter and argument can be used for the same thing: information that are passed into a function. From a function's perspective: A parameter is the variable listed inside the parentheses in the function definition. An argument is the value that are sent to the function when it is called.
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
Gavin and Josh are right. This answer is only to add more background. The idea is that not only can you pass variable column names into a function like that, but expressions of column names, using quote()
.
group = quote(car) mtcars[, list(Total=length(mpg)), by=group][order(group)] group Total AMC 1 Cadillac 1 ... Toyota 2 Valiant 1 Volvo 1
Although, admitedly more difficult to start with, it can be more flexible. That's the idea, anyway. Inside functions you need substitute()
, like this :
tableOrder = function(x,.expr) { .expr = substitute(.expr) ans = x[,list(Total=length(mpg)),by=.expr] setkeyv(ans, head(names(ans),-1)) # see below re feature request #1780 ans } tableOrder(mtcars, car) .expr Total AMC 1 Cadillac 1 Camaro 1 ... Toyota 2 Valiant 1 Volvo 1 tableOrder(mtcars, substring(car,1,1)) # an expression, not just a column name .expr Total [1,] A 1 [2,] C 3 [3,] D 3 ... [8,] P 2 [9,] T 2 [10,] V 2 tableOrder(mtcars, list(cyl,gear%%2)) # by two expressions, so head(,-1) above cyl gear Total [1,] 4 0 8 [2,] 4 1 3 [3,] 6 0 4 [4,] 6 1 3 [5,] 8 1 14
A new argument keyby
was added in v1.8.0 (July 2012) making it simpler :
tableOrder = function(x,.expr) { .expr = substitute(.expr) x[,list(Total=length(mpg)),keyby=.expr] }
Comments and feedback in the area of i
,j
and by
variable expressions are most welcome. The other thing you can do is have a table where a column contains expressions and then look up which expression to put in i
, j
or by
from that table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With