can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up. <pre class="prettyprint"><code>dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B')) v1 v2 type 1: 1 3 A 2: 2 4 B col_in <- c('v2', 'v1') col_out <- paste0(col_in, '.new') </code></pre> accessing 'type' the hard-coded way <pre class="prettyprint"><code>dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in] </code></pre> produces the expected result: <pre class="prettyprint"><code> v1 v2 type v2.new v1.new 1: 1 3 A 9 1 2: 2 4 B 12 2 </code></pre> however, when accessing 'type' via get() <pre class="prettyprint"><code>dt[, (col_out) := lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in] </code></pre> the expected values for <code>v1.new</code> are in <code>v2.new</code> and vice versa: <pre class="prettyprint"><code> v1 v2 type v2.new v1.new 1: 1 3 A 1 9 2: 2 4 B 2 12 </code></pre> Note: This a minimal toy example that I distilled down from a more complex operation that I'm trying to implement. The name of the 'type' variable is given as an input parameter.

Another way is to use cool R feature called computing on the language (not related to data.table) instead of <code>get</code> and produce required <code>j</code> argument as language object using <code>substitute</code> function. This will work also when grouping. <pre class="prettyprint"><code>library(data.table) dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B')) col_in <- c('v2', 'v1') col_out <- paste0(col_in, '.new') col_where <- 'type' qj <- substitute(.col_out := lapply(.SD, function(x){x * min(x[.col_where == 'A'])}), list(.col_out=col_out, .col_where=as.name(col_where))) print(qj) #`:=`(c("v2.new", "v1.new"), lapply(.SD, function(x) { # x * min(x[type == "A"]) #})) dt[, eval(qj), .SDcols = col_in][] # v1 v2 type v2.new v1.new # <num> <num> <char> <num> <num> #1: 1 3 A 9 1 #2: 2 4 B 12 2 </code></pre> More about this nice feature in R language definition: Computing-on-the-language chapter.

data.table column order when using lapply and get

Tags:

r

lapply

data.table

can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up.

dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B'))

   v1 v2 type
1:  1  3    A
2:  2  4    B

col_in <- c('v2', 'v1')
col_out <- paste0(col_in, '.new')

accessing 'type' the hard-coded way

dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in]

produces the expected result:

   v1 v2 type v2.new v1.new
1:  1  3    A      9      1
2:  2  4    B     12      2

however, when accessing 'type' via get()

dt[, (col_out) := lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in]

the expected values for v1.new are in v2.new and vice versa:

   v1 v2 type v2.new v1.new
1:  1  3    A      1      9
2:  2  4    B      2     12

Note: This a minimal toy example that I distilled down from a more complex operation that I'm trying to implement. The name of the 'type' variable is given as an input parameter.

562

asked Jun 15 '18 15:06

Steffen J.

2 Answers

Interesting! Thanks for sharing! It seems that the use of get requires some internal sorting (bug?).

Two ways to avoid this:

Move the type == 'A' part outside the dt[,lapply(...)]

referenceRows <- which(dt[,type == 'A'])
referenceRows <- which(dt[,get('type') == 'A'])
dt[, lapply(.SD, function(x){x * min(x[referenceRows])}), .SDcols = col_in]

   v1 v2 type v2.new v1.new
1:  1  3    A      9      1
2:  2  4    B     12      2

First create the new columns and then use setnames to make sure that the new columns are assigned the proper columns names. Finally bind the two parts together with cbind:

dtNew <- dt[, lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in]
setnames(dtNew, col_in, col_out)
cbind(dt, dtNew)


   v1 v2 type v2.new v1.new
1:  1  3    A      9      1
2:  2  4    B     12      2

Same result (although differently sorted):

    dtNew <- dt[, lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in]
    setnames(dtNew, col_in, col_out)
    cbind(dt, dtNew)


       v1 v2 type v1.new v2.new
    1:  1  3    A      1      9
    2:  2  4    B      2     12

172

answered Oct 24 '22 09:10

Marvin Steijaert

Another way is to use cool R feature called computing on the language (not related to data.table) instead of get and produce required j argument as language object using substitute function.
This will work also when grouping.

library(data.table)
dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B'))
col_in <- c('v2', 'v1')
col_out <- paste0(col_in, '.new')

col_where <- 'type'
qj <- substitute(.col_out := lapply(.SD, function(x){x * min(x[.col_where == 'A'])}),
                 list(.col_out=col_out, .col_where=as.name(col_where)))
print(qj)
#`:=`(c("v2.new", "v1.new"), lapply(.SD, function(x) {
#    x * min(x[type == "A"])
#}))

dt[, eval(qj), .SDcols = col_in][]
#      v1    v2   type v2.new v1.new
#   <num> <num> <char>  <num>  <num>
#1:     1     3      A      9      1
#2:     2     4      B     12      2

More about this nice feature in R language definition: Computing-on-the-language chapter.

answered Oct 24 '22 11:10

jangorecki

Related questions
                            
                                R : Updating an entry in mongodb using mongolite
                            
                                Automatically generate command to reproduce an object in the workspace [duplicate]
                            
                                Check if list contains another list in R
                            
                                Rendering an .Rmd fails because stringr.rdb is corrupt?
                            
                                Using rmarkdown::render to set document header (title, author, date)
                            
                                ANOVA with block design and repeated measures
                            
                                Changing menu icon in shiny dashboard
                            
                                Create a colour blind test with ggplot
                            
                                Installing both Python and R for a Travis build?
                            
                                assign to is.na(clinical.trial$age)
                            
                                Capture click within iframe in a shiny app
                            
                                navebarMenu is always highlighted
                            
                                Shiny DT: format date column in excel through Buttons extensions
                            
                                How can one use R within Google Cloud Datalab notebook
                            
                                add 'Working Papers' Section (within publications) to hugo-academic site
                            
                                Remove all rows which do not contain a specific string in R
                            
                                dbplyr mutate character to date format in temp table
                            
                                R + ggplot + pdf device + LaTeX: is it possible to embed fonts one time
                            
                                Using group by and tidy to run several models and extract results to dataframe
                            
                                R leaflet map - Change legends based on selected layer group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With