Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R data.table, how do I pass variable parameters to an expression?

I am stuck with a small R issue with data.table. Your help is much appreciated. How do I do this:

getResult <- function(dt, expr, gby) {
  e <- substitute(expr)
  b <- substitute(gby)
  return(dt[,eval(e),by=b])
}

v1 <- "Sepal.Length"
v2 <- "Species"

dt <- data.table(iris)
rDT <- getResult(dt, sum(v1, na.rm=TRUE), v2)

I get following error:

Error in sum(v1, na.rm = TRUE) : invalid 'type' (character) of argument

Now, both v1 and v2 get passed from other program as character variable so I can't do this v1<- quote(Sepal.Length) which seems to work.

like image 490
user1157129 Avatar asked May 20 '12 16:05

user1157129


People also ask

How do you declare a variable in R?

Variables are containers for storing data values. R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:

How to tabulate data in R using table?

In R, these tables can be created using table () along with some of its variations. To use table (), simply add in the variables you want to tabulate separated by a comma. Note that table () does not have a data= argument like many other functions do (e.g., ggplot2 functions), so you much reference the variable using dataset$variable.

How to include all the variables except one column in R?

Suppose you want to include all the variables except one column, say. 'origin'. It can be easily done by adding ! sign (implies negation in R) You can use %like% operator to find pattern. It is same as base R's grepl () function , SQL's LIKE operator and SAS's CONTAINS function.

How to find pattern of variables in R?

It can be easily done by adding ! sign (implies negation in R) You can use %like% operator to find pattern. It is same as base R's grepl () function , SQL's LIKE operator and SAS's CONTAINS function. You can rename variables with setnames () function. In the following code, we are renaming a variable 'dest' to 'destination'.


1 Answers

An alternative to flodel's answer in the comments could be

e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))

b <- parse(text = v2)

rDT2 <- dt[, eval(e), by = eval(b)]

#               b    V1
# [1,]     setosa 250.3
# [2,] versicolor 296.8
# [3,]  virginica 329.4

EDIT:

And to put this into a function,

getResult <- function(dt, expr, gby){
  return(dt[, eval(expr), by = eval(gby)])
}

(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above


EDIT from Matthew: There's a subtle reason why the paste0 and eval \ quote methods can be faster than get in some cases, too. One of the reasons grouping can be fast is that data.table inspects j to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j) to do that. When using get() in j the column being used is hidden from all.vars and data.table falls back to subsetting all the columns just in case the j expression needs them (much like when the .SD symbol is used in j, for which .SDcols was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT is say 1e7x100 then a grouped j=sum(V1) should be much faster than a grouped j=sum(get("V1")) for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0 and parse might come into it. All depends really. Setting verbose=TRUE should print out a message about which columns have been detected as used by j, so that can be checked.

like image 138
BenBarnes Avatar answered Oct 23 '22 07:10

BenBarnes