Fast vectorized merge of list of data.frames by row

Tags:

Most of the questions about merging data.frame in lists on SO don't quite relate to what I'm trying to get across here, but feel free to prove me wrong.

I have a list of data.frames. I would like to "rbind" rows into another data.frame by row. In essence, all first rows form one data.frame, second rows second data.frame and so on. Result would be a list of the same length as the number of rows in my original data.frame(s). So far, the data.frames are identical in dimensions.

Here's some data to play around with.

sample.list <- list(data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),         data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),         data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),         data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),         data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),         data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),         data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)))

Here's what I've come up with with the good ol' for loop.

#solution 1 my.list <- vector("list", nrow(sample.list[[1]])) for (i in 1:nrow(sample.list[[1]])) {     for (j in 1:length(sample.list)) {         my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ])     } }  #solution 2 (so far my favorite) sample.list2 <- do.call("rbind", sample.list) my.list2 <- vector("list", nrow(sample.list[[1]]))  for (i in 1:nrow(sample.list[[1]])) {     my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ] }

Can this be improved using vectorization without much brainhurt? Correct answer will contain a snippet of code, of course. "Yes" as an answer doesn't count.

EDIT

#solution 3 (a variant of solution 2 above) ind <- rep(1:nrow(sample.list[[1]]), times = length(sample.list)) my.list3 <- split(x = sample.list2, f = ind)

BENCHMARKING

I've made my list larger with more rows per data.frame. I've benchmarked the results which are as follows:

#solution 1 system.time(for (i in 1:nrow(sample.list[[1]])) {     for (j in 1:length(sample.list)) {         my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ])     } })    user  system elapsed   80.989   0.004  81.210   # solution 2 system.time(for (i in 1:nrow(sample.list[[1]])) {     my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ] })    user  system elapsed    0.957   0.160   1.126   # solution 3 system.time(split(x = sample.list2, f = ind))    user  system elapsed    1.104   0.204   1.332   # solution Gabor system.time(lapply(1:nr, bind.ith.rows))    user  system elapsed    0.484   0.000   0.485   # solution ncray system.time(alply(do.call("cbind",sample.list), 1,                 .fun=matrix, ncol=ncol(sample.list[[1]]), byrow=TRUE,                 dimnames=list(1:length(sample.list),names(sample.list[[1]]))))    user  system elapsed   11.296   0.016  11.365

695

asked Feb 01 '11 13:02

Roman Luštrik

1 Answers

Try this:

bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE)) nr <- nrow(sample.list[[1]]) lapply(1:nr, bind.ith.rows)

164

answered Oct 20 '22 00:10

G. Grothendieck

Related questions
                            
                                Include files R?
                            
                                What is the difference between cat and print?
                            
                                When should I use setDT() instead of data.table() to create a data.table?
                            
                                R Shiny set DataTable column width
                            
                                R knitr: Possible to programmatically modify chunk labels?
                            
                                No non-missing arguments warning when using min or max in reshape2
                            
                                Get a list of the data sets in a particular package
                            
                                reshape vs. reshape2 in R
                            
                                extracting standardized coefficients from lm in R
                            
                                How to get the name of the calling function inside the called routine?
                            
                                What are Replacement Functions in R?
                            
                                Sort matrix according to first column in R
                            
                                Set R plots x axis to show at y=0
                            
                                Reading data from PDF files into R
                            
                                Solution. How to install_github when there is a proxy
                            
                                Extract matrix column values by matrix column name
                            
                                How to slice data from a middle index until the end without using `length` in R (like you can in python)?
                            
                                Adjust Transparency (alpha) of stat_smooth lines, not just transparency of Confidence Interval
                            
                                lambda-like functions in R?
                            
                                dplyr: How to use group_by inside a function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast vectorized merge of list of data.frames by row

Tags:

performance

merge

list

dataframe

r

Roman Luštrik

People also ask

1 Answers

G. Grothendieck

Recent Activity

Donate For Us