Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine a list of data.tables

Tags:

Is there a specific method for combining a list of data.tables in R?

I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.

I've been doing it with

Reduce('rbind', data.table) 

but it takes a while.

Tnx!

like image 392
user680111 Avatar asked Sep 03 '12 17:09

user680111


People also ask

How do I join two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I merge two data tables in Uipath?

use merge datatable activity if the no cloumns is equals in both the datatables if not you need iterate through each in datatable and the assign the data to another datable or you can use vbscript to add the first column from one datatable to another datatable.


2 Answers

See ?rbindlist and these related questions (easier to find when you know what to search for!) :

data.table questions and answers containing rbindlist

like image 189
Matt Dowle Avatar answered Mar 16 '23 04:03

Matt Dowle


Using do.call appears to be about 10x faster with this made up example:

library(data.table)  x1 <- data.table(x = runif(1e6), y = runif(1e6)) x2 <- data.table(x = runif(1e6), y = runif(1e6))  #20 data.tables all of length 1e6 yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2)  system.time(out1 <- Reduce("rbind", yourList)) #-----    user  system elapsed     3.37    3.03    6.43  system.time(out2 <- do.call("rbind", yourList)) #-----    user  system elapsed     0.33    0.36    0.68  all.equal(out1,out2) #----- [1] TRUE 

Edit - to incorporate Matt's answer

I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:

system.time(out3 <- rbindlist(yourList)) #-----    user  system elapsed     0.07    0.03    0.11   all.equal(out1,out3) #----- [1] TRUE 
like image 24
Chase Avatar answered Mar 16 '23 05:03

Chase