Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Potential problems from over-allocating truelength more than 1000 times

Tags:

r

data.table

tl;dr: What are potential problems after a truelength over-allocation warning?

Recently I did something stupid like this:

m <- matrix(seq_len(1e4),nrow=10)

library(data.table)
DT <- data.table(id=rep(1:2,each=5),m)
DT[,id2:=id]

#Warning message:
#  In `[.data.table`(DT, , `:=`(id2, id)) :
#  tl (2002) is greater than 1000 items over-allocated (ncol = 1001). 
#     If you didn't set the datatable.alloccol option very large, 
#     please report this to datatable-help including the result of sessionInfo().

DT[,lapply(.SD,mean),by=id2]

After some searching it became apparent that the warning resulted from adding a column by reference to a data.table with too many columns and I found some rather technical explanations (e.g., this), which I probably don't fully understand.

I know that I can avoid the issue (e.g., use data.table(id=rep(1:2,each=5),stack(as.data.frame(m)))), but I wonder if I should expect problems subsequent to such a warning (other than the obvious performance disadvantage from working with a wide format data.table).

R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
  [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
  [1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
  [1] data.table_1.8.8 fortunes_1.5-0  
like image 882
Roland Avatar asked Mar 15 '13 15:03

Roland


1 Answers

Good question. By default in v1.8.8 :

> options()$datatable.alloccol
max(100, 2 * ncol(DT))

That's probably not the best default. Try changing it :

options(datatable.alloccol = quote(max(100L, ncol(DT)+64L))

UPDATE: I've now changed the default in v1.8.9 to that.

That option just controls how many spare column pointer slots are allocated so that := can add columns by reference.


From NOTES in NEWS for v1.8.9

  • The default for datatable.alloccol has changed from max(100L, 2L*ncol(DT)) to max(100L, ncol(DT)+64L). And a pointer to ?truelength has been added to an error message as sugggested and thanks to Roland :
    Potential problems from over-allocating truelength more than 1000 times
like image 184
Matt Dowle Avatar answered Oct 05 '22 23:10

Matt Dowle