Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: create multiple columns with lapply and .SD [duplicate]

Tags:

r

data.table

I am trying to apply the scale() function over multiple columns of a data.table to define new columns. I am getting the following error:

dt = data.table( id = rep( 1:10, each = 10 ), 
             A = rnorm( 100, 1, 2 ), 
             B = runif( 100, 0, 1 ),
             C = rnorm( 100, 10, 20 ) )


cols_to_use    = c( "A", "B", "C" )
cols_to_define = paste0( cols_to_use, "_std" )

# working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), .SDcols = cols_to_use ]

# not working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), by = id, .SDcols = cols_to_use ]
## Error in `[.data.table`(dt, , `:=`((cols_to_define), lapply(.SD, scale)),  : 
## All items in j=list(...) should be atomic vectors or lists. 
## If you are trying something like j=list(.SD,newcol=mean(colA)) then
## use := by group instead (much quicker), or cbind or merge afterwards.

Any ideas why when removing the by operation this works?

like image 430
Francesco Grossetti Avatar asked Sep 06 '25 05:09

Francesco Grossetti


1 Answers

Issue is the with output of scale which is a matrix

dim(scale(dt$A))
#[1] 100   1

so, we need to change it to a vector by removing the dim attributes. Either as.vector or c would do it

dt[ , ( cols_to_define ) := lapply( .SD, function(x) 
          c(scale(x)) ), by = id, .SDcols = cols_to_use ]

When there is no by the matrix dim attributes gets dropped while keeping the other attributes.

like image 131
akrun Avatar answered Sep 07 '25 19:09

akrun