Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use a character vector in the `by` argument

Within the data.table package in R, is there a way in order to use a character vector to be assigned within the by argument of the calculation?

Here is an example of what would be the desired output from this using mtcars:

 mtcars <- data.table(mtcars)
 ColSelect <- 'cyl' # One Column Option
 mtcars[,.( AveMpg = mean(mpg)), by = .(ColSelect)] # Doesn't work

 # Desired Output 
    cyl   AveMpg
 1:   6 19.74286
 2:   4 26.66364
 3:   8 15.10000

I know that this is possible to use assigning column names in j by enclosing the vector around brackets.

 ColSelect <- 'AveMpg' # Column to be assigned for average mpg value
 mtcars[,(ColSelect):= mean(mpg), by = .(cyl)]
 head(mtcars)

    mpg cyl disp  hp drat    wt  qsec vs am gear carb   AveMpg
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 19.74286
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 19.74286
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 26.66364
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 19.74286
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 15.10000
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 19.74286

Is there a suggestion as to what to put in the by argument in order to achieve this?

like image 958
M. Stamp Avatar asked Oct 17 '22 23:10

M. Stamp


1 Answers

From ?data.table in the by section it says that by accepts:

  • a single character string containing comma separated column names (where spaces are significant since column names may contain spaces
    even at the start or end): e.g., DT[, sum(a), by="x,y,z"]
  • a character vector of column names: e.g., DT[, sum(a), by=c("x", "y")]

So yes, you can use the answer in @cccmir's response. You can also use c() as @akrun mentioned, but that seems slightly extraneous unless you want multiple columns.

The reason you cannot use .() syntax is that in data.table .() is an alias for list(). And according to the same help for by the list() syntax requires an expression of column names - not a character string.

Going off the examples in the by help if you wanted to use multiple variables and pass the names as characters you could do:

  1. mtcars[,.( AveMpg = mean(mpg)), by = "cyl,am"]
  2. mtcars[,.( AveMpg = mean(mpg)), by = c("cyl","am")]
like image 194
Mike H. Avatar answered Oct 21 '22 09:10

Mike H.