Most of the questions about merging data.frame in lists on SO don't quite relate to what I'm trying to get across here, but feel free to prove me wrong.
I have a list of data.frames. I would like to "rbind" rows into another data.frame by row. In essence, all first rows form one data.frame, second rows second data.frame and so on. Result would be a list of the same length as the number of rows in my original data.frame(s). So far, the data.frames are identical in dimensions.
Here's some data to play around with.
sample.list <- list(data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)))
Here's what I've come up with with the good ol' for loop.
#solution 1 my.list <- vector("list", nrow(sample.list[[1]])) for (i in 1:nrow(sample.list[[1]])) { for (j in 1:length(sample.list)) { my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ]) } } #solution 2 (so far my favorite) sample.list2 <- do.call("rbind", sample.list) my.list2 <- vector("list", nrow(sample.list[[1]])) for (i in 1:nrow(sample.list[[1]])) { my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ] }
Can this be improved using vectorization without much brainhurt? Correct answer will contain a snippet of code, of course. "Yes" as an answer doesn't count.
EDIT
#solution 3 (a variant of solution 2 above) ind <- rep(1:nrow(sample.list[[1]]), times = length(sample.list)) my.list3 <- split(x = sample.list2, f = ind)
BENCHMARKING
I've made my list larger with more rows per data.frame. I've benchmarked the results which are as follows:
#solution 1 system.time(for (i in 1:nrow(sample.list[[1]])) { for (j in 1:length(sample.list)) { my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ]) } }) user system elapsed 80.989 0.004 81.210 # solution 2 system.time(for (i in 1:nrow(sample.list[[1]])) { my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ] }) user system elapsed 0.957 0.160 1.126 # solution 3 system.time(split(x = sample.list2, f = ind)) user system elapsed 1.104 0.204 1.332 # solution Gabor system.time(lapply(1:nr, bind.ith.rows)) user system elapsed 0.484 0.000 0.485 # solution ncray system.time(alply(do.call("cbind",sample.list), 1, .fun=matrix, ncol=ncol(sample.list[[1]]), byrow=TRUE, dimnames=list(1:length(sample.list),names(sample.list[[1]])))) user system elapsed 11.296 0.016 11.365
The function rbind() is slow, particularly as the data frame gets bigger. You should never use it in a loop. The right way to do it is to initialize the output object at its final size right from the start and then simply fill it in with each turn of the loop.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause.
Try this:
bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE)) nr <- nrow(sample.list[[1]]) lapply(1:nr, bind.ith.rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With