Convert Mixed-Length named List to data.frame

Question

I have a list of the following format:

[[1]]
[[1]]$a
[1] 1

[[1]]$b
[1] 3

[[1]]$c
[1] 5

[[2]]       
[[2]]$c
[1] 2

[[2]]$a
[1] 3

There is a predefined list of possible "keys" (a, b, and c, in this case) and each element in the list ("row") will have values defined for one or more of these keys. I'm looking for a fast way to get from the list structure above to a data.frame which would look like the following, in this case:

  a  b c
1 1  3 5
2 3 NA 2

Any help would be appreciated!

Appendix

I'm dealing with a table that will have up to 50,000 rows and 3-6 columns, with most of the values specified. I'll be taking the table in from JSON and trying to quickly get it into data.frame structure.

Here's some code to create a sample list of the scale with which I'll be working:

ids <- c("a", "b", "c")
createList <- function(approxSize=100){     
    set.seed(1234)

    fifth <- round(approxSize/5)

    list <- list()
    list[1:(fifth*5)] <- rep(
        list(list(a=1, b=2, c=3), 
                 list(a=3, b=4, c=5),
                 list(a=7, c=9),
                 list(c=6, a=8, b=3),
                 list(b=6)), 
        fifth)

    list
}

Just create a list with approxSize of 50,000 to test the performance on a list of this size.

Josh O'Brien · Accepted Answer

Here's my initial thought. It doesn't speed up your approach, but it does simplify the code considerably:

# makeDF <- function(List, Names) {
#     m <- t(sapply(List, function(X) unlist(X)[Names], 
#     as.data.frame(m)
# }    

## vapply() is a bit faster than sapply()
makeDF <- function(List, Names) {
    m <- t(vapply(List, 
                  FUN = function(X) unlist(X)[Names], 
                  FUN.VALUE = numeric(length(Names))))
    as.data.frame(m)
}

## Test timing with a 50k-item list
ll <- createList(50000)
nms <- c("a", "b", "c")

system.time(makeDF(ll, nms))
# user  system elapsed 
# 0.47    0.00    0.47

flodel · Answer

Here is a short answer, I doubt it will be very fast though.

> library(plyr)
> rbind.fill(lapply(x, as.data.frame))
  a  b c
 1 1  3 5
 2 3 NA 2

mnel · Answer

If you know the possible values beforehand, and you are dealing with large data, perhaps using data.table and set will be fast

cc <- createList(50000)



system.time({
nas <- rep.int(NA_real_, length(cc))
DT <-  setnames(as.data.table(replicate(length(ids),nas, simplify = FALSE)), ids)

for(xx in seq_along(cc)){

  .n <- names(cc[[xx]])
  for(j in .n){
    set(DT, i = xx, j = j, value = cc[[xx]][[j]])
  }


}

})


# user  system elapsed 
# 0.68    0.01    0.70

Old (slow solution) for posterity

full <- c('a','b', 'c')

system.time({
for(xx in seq_along(cc)) {
  mm <- setdiff(full, names(cc[[xx]]))
  if(length(mm) || all(names(cc[[xx]]) == full)){
  cc[[xx]] <- as.data.table(cc[[xx]])
  # any missing columns

  if(length(mm)){
  # if required add additional columns
    cc[[xx]][, (mm) := as.list(rep(NA_real_, length(mm)))]
  }
  # put columns in correct order
  setcolorder(cc[[xx]], full) 
  }
}

 cdt <- rbindlist(cc)
})

#   user  system elapsed 
# 21.83    0.06   22.00

This second solution has been left here to show how data.table can be used poorly.

Matt Hawthorn · Answer

I know this is an old question, but I just came across it and it's excruciating not to see the simplest solution I'm aware of. So here it is (simply specify 'fill=TRUE' in rbindlist):

library(data.table)
list = list(list(a=1,b=3,c=5),list(c=2,a=3))
rbindlist(list,fill=TRUE)

#    a  b c
# 1: 1  3 5
# 2: 3 NA 2

I don't know if this is the fastest way, but I'd be willing to bet that it competes, given data.table's thoughtful design and extremely good performance on a lot of other tasks.

Convert Mixed-Length named List to data.frame

Tags:

dataframe

r

Jeff Allen

4 Answers

Josh O'Brien

flodel

Old (slow solution) for posterity

mnel

Matt Hawthorn

Recent Activity

Donate For Us

Convert Mixed-Length named List to data.frame

Tags:

dataframe

r

Jeff Allen

4 Answers

Josh O'Brien

flodel

Old (slow solution) for posterity

mnel

Matt Hawthorn

Related questions

Recent Activity

Donate For Us