Combine a list of data.tables


Is there a specific method for combining a list of data.tables in R?

I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.

I've been doing it with

Reduce('rbind', data.table) 

but it takes a while.


user680111 Avatar asked Sep 03 '12 17:09


2 Answers

See ?rbindlist and these related questions (easier to find when you know what to search for!) :

Matt Dowle Avatar answered Mar 16 '23 04:03

Matt Dowle

Using do.call appears to be about 10x faster with this made up example:

library(data.table)  x1 <- data.table(x = runif(1e6), y = runif(1e6)) x2 <- data.table(x = runif(1e6), y = runif(1e6))  #20 data.tables all of length 1e6 yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2)  system.time(out1 <- Reduce("rbind", yourList)) #-----    user  system elapsed     3.37    3.03    6.43  system.time(out2 <- do.call("rbind", yourList)) #-----    user  system elapsed     0.33    0.36    0.68  all.equal(out1,out2) #----- [1] TRUE 

Edit - to incorporate Matt's answer

I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:

system.time(out3 <- rbindlist(yourList)) #-----    user  system elapsed     0.07    0.03    0.11   all.equal(out1,out3) #----- [1] TRUE 
Chase Avatar answered Mar 16 '23 05:03
