Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table column order when using lapply and get

can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up.

dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B'))

   v1 v2 type
1:  1  3    A
2:  2  4    B

col_in <- c('v2', 'v1')
col_out <- paste0(col_in, '.new')

accessing 'type' the hard-coded way

dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in]

produces the expected result:

   v1 v2 type v2.new v1.new
1:  1  3    A      9      1
2:  2  4    B     12      2

however, when accessing 'type' via get()

dt[, (col_out) := lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in]

the expected values for v1.new are in v2.new and vice versa:

   v1 v2 type v2.new v1.new
1:  1  3    A      1      9
2:  2  4    B      2     12

Note: This a minimal toy example that I distilled down from a more complex operation that I'm trying to implement. The name of the 'type' variable is given as an input parameter.

like image 562
Steffen J. Avatar asked Jun 15 '18 15:06

Steffen J.


People also ask

Does Lapply return a list?

Format of an lapplylapply returns a list as it's output. In the output list there is one component for each component of the input list and it's value is the result of applying the function to the input component.

What does. sd mean in data table?

SD stands for "Subset of Data. table". The dot before SD has no significance but doesn't let it clash with a user-defined column name.

What is sd in r data table?

SD is a single sub- data. table ). This allows us to concisely express an operation that we'd like to perform on each sub- data. table before the re-assembled result is returned to us.


2 Answers

Interesting! Thanks for sharing! It seems that the use of get requires some internal sorting (bug?).

Two ways to avoid this:

  1. Move the type == 'A' part outside the dt[,lapply(...)]

    referenceRows <- which(dt[,type == 'A'])
    referenceRows <- which(dt[,get('type') == 'A'])
    dt[, lapply(.SD, function(x){x * min(x[referenceRows])}), .SDcols = col_in]
    
       v1 v2 type v2.new v1.new
    1:  1  3    A      9      1
    2:  2  4    B     12      2
    
  2. First create the new columns and then use setnames to make sure that the new columns are assigned the proper columns names. Finally bind the two parts together with cbind:

    dtNew <- dt[, lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in]
    setnames(dtNew, col_in, col_out)
    cbind(dt, dtNew)
    
    
       v1 v2 type v2.new v1.new
    1:  1  3    A      9      1
    2:  2  4    B     12      2
    

Same result (although differently sorted):

    dtNew <- dt[, lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in]
    setnames(dtNew, col_in, col_out)
    cbind(dt, dtNew)


       v1 v2 type v1.new v2.new
    1:  1  3    A      1      9
    2:  2  4    B      2     12
like image 172
Marvin Steijaert Avatar answered Oct 24 '22 09:10

Marvin Steijaert


Another way is to use cool R feature called computing on the language (not related to data.table) instead of get and produce required j argument as language object using substitute function.
This will work also when grouping.

library(data.table)
dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B'))
col_in <- c('v2', 'v1')
col_out <- paste0(col_in, '.new')

col_where <- 'type'
qj <- substitute(.col_out := lapply(.SD, function(x){x * min(x[.col_where == 'A'])}),
                 list(.col_out=col_out, .col_where=as.name(col_where)))
print(qj)
#`:=`(c("v2.new", "v1.new"), lapply(.SD, function(x) {
#    x * min(x[type == "A"])
#}))

dt[, eval(qj), .SDcols = col_in][]
#      v1    v2   type v2.new v1.new
#   <num> <num> <char>  <num>  <num>
#1:     1     3      A      9      1
#2:     2     4      B     12      2

More about this nice feature in R language definition: Computing-on-the-language chapter.

like image 31
jangorecki Avatar answered Oct 24 '22 11:10

jangorecki