Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rbind `data.tables` and preserve key

I'm looking for behavior similar to inserting into an already keyed SQL table, where the new rows added are inserted into existing keys. For example, in this case:

dt <- data.table(a=1:10)
setkey(dt, a)
tables()
#      NAME NROW MB COLS KEY
# [1,] dt     10 1  a    a  
dt.2 <- rbindlist(list(dt, data.table(a=1:5)))
tables()
#      NAME NROW MB COLS KEY
# [1,] dt     10 1  a    a  
# [2,] dt.2   15 1  a      

i would like to have the option of having dt.2 "inherit" the key (updated with the incremental data, obviously) from dt, instead of having no key as actually happened.

I was at first a bit surprised at the loss of the key in the first place, but that is clearly the documented behavior.

Is there a clean way of doing this without calling setkey after each rbind/rbindlist?

like image 707
BrodieG Avatar asked Jan 13 '14 17:01

BrodieG


1 Answers

Essentially, data.table doesn't currently support row insert at all, let alone into a keyed table. rbind creates a new data.table so isn't fast or memory efficient.

A similar question is here :

How to delete a row by reference in data.table?

Currently, the typical workflow is to load files from disk using fread and rbindlist them together, or load data from a database using RODBC or similar.

We'd like to add fast row insert, but it isn't done yet.

like image 173
Matt Dowle Avatar answered Oct 13 '22 00:10

Matt Dowle