I am trying to apply the scale()
function over multiple columns of a data.table
to define new columns. I am getting the following error:
dt = data.table( id = rep( 1:10, each = 10 ),
A = rnorm( 100, 1, 2 ),
B = runif( 100, 0, 1 ),
C = rnorm( 100, 10, 20 ) )
cols_to_use = c( "A", "B", "C" )
cols_to_define = paste0( cols_to_use, "_std" )
# working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), .SDcols = cols_to_use ]
# not working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), by = id, .SDcols = cols_to_use ]
## Error in `[.data.table`(dt, , `:=`((cols_to_define), lapply(.SD, scale)), :
## All items in j=list(...) should be atomic vectors or lists.
## If you are trying something like j=list(.SD,newcol=mean(colA)) then
## use := by group instead (much quicker), or cbind or merge afterwards.
Any ideas why when removing the by
operation this works?
Issue is the with output of scale
which is a matrix
dim(scale(dt$A))
#[1] 100 1
so, we need to change it to a vector
by removing the dim
attributes. Either as.vector
or c
would do it
dt[ , ( cols_to_define ) := lapply( .SD, function(x)
c(scale(x)) ), by = id, .SDcols = cols_to_use ]
When there is no by
the matrix
dim
attributes gets dropped while keeping the other attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With