I am stuck with a small R issue with data.table
. Your help is much appreciated. How do I do this:
getResult <- function(dt, expr, gby) {
e <- substitute(expr)
b <- substitute(gby)
return(dt[,eval(e),by=b])
}
v1 <- "Sepal.Length"
v2 <- "Species"
dt <- data.table(iris)
rDT <- getResult(dt, sum(v1, na.rm=TRUE), v2)
I get following error:
Error in sum(v1, na.rm = TRUE) : invalid 'type' (character) of argument
Now, both v1
and v2
get passed from other program as character variable so I can't do this v1<- quote(Sepal.Length)
which seems to work.
Variables are containers for storing data values. R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:
In R, these tables can be created using table () along with some of its variations. To use table (), simply add in the variables you want to tabulate separated by a comma. Note that table () does not have a data= argument like many other functions do (e.g., ggplot2 functions), so you much reference the variable using dataset$variable.
Suppose you want to include all the variables except one column, say. 'origin'. It can be easily done by adding ! sign (implies negation in R) You can use %like% operator to find pattern. It is same as base R's grepl () function , SQL's LIKE operator and SAS's CONTAINS function.
It can be easily done by adding ! sign (implies negation in R) You can use %like% operator to find pattern. It is same as base R's grepl () function , SQL's LIKE operator and SAS's CONTAINS function. You can rename variables with setnames () function. In the following code, we are renaming a variable 'dest' to 'destination'.
An alternative to flodel's answer in the comments could be
e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))
b <- parse(text = v2)
rDT2 <- dt[, eval(e), by = eval(b)]
# b V1
# [1,] setosa 250.3
# [2,] versicolor 296.8
# [3,] virginica 329.4
EDIT:
And to put this into a function,
getResult <- function(dt, expr, gby){
return(dt[, eval(expr), by = eval(gby)])
}
(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above
EDIT from Matthew:
There's a subtle reason why the paste0
and eval
\ quote
methods can be faster than get
in some cases, too. One of the reasons grouping can be fast is that data.table
inspects j
to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j)
to do that. When using get()
in j
the column being used is hidden from all.vars
and data.table
falls back to subsetting all the columns just in case the j
expression needs them (much like when the .SD
symbol is used in j
, for which .SDcols
was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT
is say 1e7x100 then a grouped j=sum(V1)
should be much faster than a grouped j=sum(get("V1"))
for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0
and parse
might come into it. All depends really. Setting verbose=TRUE
should print out a message about which columns have been detected as used by j
, so that can be checked.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With