Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using variable in data.table group by clause

Tags:

list

r

data.table

I have a data.table that I am trying to summarise. This is my approach

library(data.table)

dtIris <-data.table(iris)
dt1 <- dtIris[, list(AvgSepalWidth = mean(Sepal.Width)), 
              by=list(TrimSpecies = substr(Species,1,3),Petal.Length)]

I am wanting to be able to use a variable to identify one of the items to group by, I just can't get it to evalulate the variable in the list. It just treats it like a string and throws an error.

myvar <- "Petal.Length"
dt1 <- dtIris[, list(AvgSepalWidth = mean(Sepal.Width)), 
              by=list(TrimSpecies = substr(Species,1,3),myvar)]

I have tried noquote(), eval(), parse(text=) all to no avail. Any guidance would be really appreciated.

like image 319
Dan Avatar asked Jul 24 '15 01:07

Dan


1 Answers

You can use eval(parse(text=myvar)) or get(myvar) though that will name your grouping column parse or get respectively (then you could rename it).

myvar <- "Petal.Length"
dtIris[, list(AvgSepalWidth = mean(Sepal.Width)), 
              by=list(TrimSpecies = substr(Species,1,3), eval(parse(text=myvar)))]

dtIris[, list(AvgSepalWidth = mean(Sepal.Width)), 
              by=list(TrimSpecies = substr(Species,1,3), get(myvar))]

I am not sure how to do it in a way that preserves the name like you want it to. (Edit: by=setNames(list(...), c('TrimSpecies', myvar)) - thanks @thelatemail!)


Edit - out of interest, in response to some comments below.

library(rbenchmark)
benchmark(
    eval=dtIris[, list(AvgSepalWidth = mean(Sepal.Width)), 
              by=list(TrimSpecies = substr(Species,1,3), eval(parse(text=myvar)))],
    get=dtIris[, list(AvgSepalWidth = mean(Sepal.Width)), 
              by=list(TrimSpecies = substr(Species,1,3), get(myvar))],
    chain=dtIris[, TrimSpecies := substr(Species,1,3)][,list(AvgSepalWidth = mean(Sepal.Width)),by=c("TrimSpecies",myvar)][,TrimSpecies:=NULL][]
)
   test replications elapsed relative user.self sys.self user.child sys.child
3 chain          100   0.151    1.987     0.250        0          0         0
1  eval          100   0.079    1.039     0.097        0          0         0
2   get          100   0.076    1.000     0.094        0          0         0

get is faster than eval(parse(text=..))) which is faster than defining TrimSpecies, using the character form of by and then removing it (chaining dts).

like image 180
mathematical.coffee Avatar answered Sep 22 '22 03:09

mathematical.coffee