Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way of converting large list to dataframe [duplicate]

I have a huge list (700 elements), each element being a vector of length = 16,000. I am looking for an efficient way of converting the list to a dataframe, in the following fashion (this is just a mock example):

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))

The end result I am looking for is:

 #  [,1] [,2] [,3]
 #a    1    2    3
 #b    4    5    6
 #c    7    8    9

This is what I have tried, but isn't working as I wish:

library(data.table)
result = rbindlist(Map(as.data.frame, lst))

Any suggestion? Please keep in mind that my real example has huge dimensions, and I would need a rather efficient way of doing this operation.

Thank you very much!

like image 551
Mayou Avatar asked Sep 11 '13 17:09

Mayou


3 Answers

Try this. We assume the components of L all are of the same length, n, and we also assume no row names:

L <- list(a = 1:4, b = 4:1) # test input

n <- length(L[[1]])
DF <- structure(L, row.names = c(NA, -n), class = "data.frame")
like image 161
G. Grothendieck Avatar answered Oct 19 '22 10:10

G. Grothendieck


I think

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
do.call(rbind,lst)

works. I don't know if there's a sneakier/dangerous/corner-cutting way to do it that's more efficient.

You could also try

m <- matrix(unlist(lst),byrow=TRUE,ncol=length(lst[[1]]))
rownames(m) <- names(lst)
as.data.frame(m)

... maybe it's faster?

You may not be able to do very much about speeding up the as.data.frame step. Looking at as.data.frame.matrix to see what could be stripped to make it as bare-bones as possible, it seems that the crux is probably that the columns have to be copied into their own individual list elements:

for (i in ic) value[[i]] <- as.vector(x[, i])

You could try stripping down as.data.frame.matrix to see if you can speed it up, but I'm guessing that this operation is the bottleneck. In order to get around it you have to find some faster way of mapping your data from a list of rows into a list of columns (perhaps an Rcpp solution??).

The other thing to consider is whether you really need a data frame -- if your data are of a homogeneous type, you could just keep the results as a matrix. Matrix operations on big data are a lot faster anyway ...

like image 30
Ben Bolker Avatar answered Oct 19 '22 08:10

Ben Bolker


How about just t(as.data.frame(List)) ?

> A = 1:16000
> List = list()
> for(i in 1:700) List[[i]] = A
> system.time(t(as.data.frame(List)))
   user  system elapsed 
   0.25    0.00    0.25 
like image 3
Señor O Avatar answered Oct 19 '22 08:10

Señor O