Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

setDT() is not adding additional columns

Tags:

r

data.table

I am using setDT() to add additional columns to a data.table but

setDT(mydata)[, paste0('F2_E',2:30) := lapply(.SD, function(x) log(value/x)), .SDcols = 32:60][]

is not being added when you run this script:

library(data.table)
library(zoo)
date = seq(as.Date("2016-01-01"),as.Date("2016-05-10"),"day")
value =seq(1,131,1)
mydata = data.frame (date, value)
mydata
setDT(mydata)[, paste0('F1',2:30) := lapply(2:30, function(x) rollmeanr(value, x, fill = rep(NA,x-1)) ),][]
setDT(mydata)[, paste0('F2',2:30) := lapply(2:30, function(x) rollapply(value,x,FUN="median",align="right",fill=NA))][]
setDT(mydata)[, paste0('F1_E',2:30) := lapply(.SD, function(x) log(value/x)     ), .SDcols = 3:31][]
setDT(mydata)[, paste0('F2_E',2:30) := lapply(.SD, function(x) log(value/x)), .SDcols = 32:60][]
rbind(colnames(mydata))


rbind(colnames(mydata))
     [,1]   [,2]    [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]  [,19]  [,20]  [,21]  [,22]  [,23]  [,24]  [,25]  [,26]  [,27] 
[1,] "date" "value" "F12" "F13" "F14" "F15" "F16" "F17" "F18" "F19" "F110" "F111" "F112" "F113" "F114" "F115" "F116" "F117" "F118" "F119" "F120" "F121" "F122" "F123" "F124" "F125" "F126"
     [,28]  [,29]  [,30]  [,31]  [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40]  [,41]  [,42]  [,43]  [,44]  [,45]  [,46]  [,47]  [,48]  [,49]  [,50]  [,51]  [,52]  [,53]  [,54] 
[1,] "F127" "F128" "F129" "F130" "F22" "F23" "F24" "F25" "F26" "F27" "F28" "F29" "F210" "F211" "F212" "F213" "F214" "F215" "F216" "F217" "F218" "F219" "F220" "F221" "F222" "F223" "F224"
     [,55]  [,56]  [,57]  [,58]  [,59]  [,60]  [,61]   [,62]   [,63]   [,64]   [,65]   [,66]   [,67]   [,68]   [,69]    [,70]    [,71]    [,72]    [,73]    [,74]    [,75]    [,76]    [,77]   
[1,] "F225" "F226" "F227" "F228" "F229" "F230" "F1_E2" "F1_E3" "F1_E4" "F1_E5" "F1_E6" "F1_E7" "F1_E8" "F1_E9" "F1_E10" "F1_E11" "F1_E12" "F1_E13" "F1_E14" "F1_E15" "F1_E16" "F1_E17" "F1_E18"
     [,78]    [,79]    [,80]    [,81]    [,82]    [,83]    [,84]    [,85]    [,86]    [,87]    [,88]    [,89]   
[1,] "F1_E19" "F1_E20" "F1_E21" "F1_E22" "F1_E23" "F1_E24" "F1_E25" "F1_E26" "F1_E27" "F1_E28" "F1_E29" "F1_E30"

You can see there are no F2_E2, F2_E3,etc... columns.

Why would those columns not be added?

like image 794
user3022875 Avatar asked Dec 06 '25 21:12

user3022875


1 Answers

Short answer:

Use setDT(mydata) once, and separately. Then do all your assignment statements.

Additionally, if you're going to add a lot of columns use the function alloc.col() to over-allocate more slots up-front until next release (v1.9.8). i.e.,

setDT(mydata)
truelength(mydata) # [1] 100
alloc.col(mydata, 1000L)
truelength(mydata) # [1] 1000

In the current development version, v1.9.7, we've increased the over-allocation to 1024, by default. So this should happen extremely rarely.


A quick and slightly detailed explanation:

This happens because data.table over-allocates column pointers during its creation, and the default over-allocation length is 100 columns. You can check this with truelength(). See ?truelength.

require(data.table)
mydata = data.frame (x=1, y=2)
setDT(mydata)      ## convert to data.table by reference
length(mydata)     ## equals the columns assigned
# [1] 2
truelength(mydata) ## total number of column slots allocated
# [1] 100

Let's add 30 more columns the way you did.

setDT(mydata)[, paste0("z", 1:30) := 1L]
length(mydata)     ## [1] 32
truelength(mydata) ## [1] 100

And another 30.

setDT(mydata)[, paste0("z", 31:60) := 1L]
length(mydata)     ## [1] 62
truelength(mydata) ## [1] 100

And another 30.

setDT(mydata)[, paste0("z", 61:90) := 1L]
length(mydata)     ## [1] 92
truelength(mydata) ## [1] 100

Now, the next time we do this, we've to add 30 more columns, but we only have 8 more slots free. So we need to create another object with even more over-allocated slots, assign all columns currently in mydata to the new object, and finally assign it back to mydata. And this is handled internally and automatically so that the user doesn't have to keep track. So the next time we do:

setDT(mydata)[, paste0("z", 91:120) := 1L]

The function [.data.table realises it needs to over-allocate again, and does so, and the new columns get added to the new object. The issue is assigning the result from this new object back to mydata which is in the parent frame of [.data.table. And that is done through assign() statement, which only accepts a variable name as character input, and setDT(mydata) isn't. So the re-assignment step fails and therefore the over-allocation couldn't be reflected back to the original object. If you'd done mydata[, paste0(..) := ...] then the input object mydata is a name, and can be used to assign the over-allocated result back to the original object, and that's why the suggestion from @thelatemail would work.

If this is all too advanced, just upgrade to the devel version, and this'll all go away, and is very unlikely to happen (unless you'd want to have more than 1024 columns in your data.table).


I've filed #1731 to remind us to come back to this and see if there are other ways to get around this case.

like image 117
Arun Avatar answered Dec 09 '25 09:12

Arun