Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert my list of lists to a usable data.frame (for printing out a table)?

Tags:

r

I have a list of unnamed lists that I need to convert into a usable data.frame. For the most part, each list inside the list have the same element names but some will have some elements that others will not. So each list should be a Row in my data.frame, each variable name should be a column and in cases where a list doesn't have a particular variable the data.frame should contain an NA element.

In my example this_list is what I'm working with and this_df is what I would like to have. I've tried various ways to unlist and convert to data.frame, but my column names just become repeated and I get only 1 observation. Thank you.

this_list <- list(list(
  Name = "One",
  A = 2,
  B = 3,
  C = 4,
  D = 5
),
list(
  Name = "Two",
  A = 5,
  B = 2,
  C = 1
))


this_df <- data.frame(Name=c("One","Two"),
                      A=c(2,5),
                      B=c(3,2),
                      C=c(4,1),
                      D=c(5,NA))
like image 795
Nickerbocker Avatar asked Dec 07 '22 15:12

Nickerbocker


1 Answers

This is a task for which people frequently reach for dplyr::bind_rows or data.table::rbindlist. However, in base R, if the list elements are consistent, a quick base R solution is do.call(rbind, ...):

do.call(rbind, list(this_list[[1]][1:4], this_list[[2]]))
#>      Name  A B C
#> [1,] "One" 2 3 4
#> [2,] "Two" 5 2 1

It returns a matrix, but can be cleaned up fairly easily.

However, if the list elements are not consistent, it recycles in an annoying way (with a warning, thankfully):

do.call(rbind, this_list)
#> Warning in (function (..., deparse.level = 1) : number of columns of result
#> is not a multiple of vector length (arg 2)
#>      Name  A B C D    
#> [1,] "One" 2 3 4 5    
#> [2,] "Two" 5 2 1 "Two"

Thus the need for a more robust solution, e.g.

rbind_list <- function(list, ...){
    # generate a vector of all variable names
    vars <- Reduce(function(x, y){union(x, names(y))}, list, init = c()); 

    filled_list <- lapply(list, function(x){
        x <- x[vars]    # add missing elements, reordering if necessary
        names(x) <- vars    # fix missing names
        x <- lapply(x, function(y){
            if (is.null(y)) {    # replace NULL with NA
                NA
            } else if (is.list(y)) {
                if (length(y) != 1) y <- list(y)    # handle non-length-1 list columns
                I(y)    # add as-is class to list columns so they don't fail
            } else {
                y
            }
        }) 
        as.data.frame(x, ...)    # coerce to data frame
    })

    do.call(rbind, filled_list)    # rbind resulting list of data frames
}

It does decidedly better than do.call(rbind, ...):

rbind_list(this_list, stringsAsFactors = FALSE)
#>   Name A B C  D
#> 1  One 2 3 4  5
#> 2  Two 5 2 1 NA

rbind_list(c(this_list, this_list))
#>   Name A B C  D
#> 1  One 2 3 4  5
#> 2  Two 5 2 1 NA
#> 3  One 2 3 4  5
#> 4  Two 5 2 1 NA

rbind_list(list(list(a = 1), list(b = 2)))
#>    a  b
#> 1  1 NA
#> 2 NA  2

rbind_list(list(list(a = 1), list(a = 1, b = 2)))
#>   a  b
#> 1 1 NA
#> 2 1  2

rbind_list(list(list(a = 1, b = 2), list(b = 2, a = 1)))
#>   a b
#> 1 1 2
#> 2 1 2

...though list column handling is still inconsistent:

# correct; is a list column
rbind_list(list(list(a = 1, c = list('foo')), list(a = 1, c = list('baz'))))
#>   a   c
#> 1 1 foo
#> 2 1 baz

# also correct
rbind_list(list(list(a = 1, c = list(c('foo', 'bar'))), list(a = 1, c = list('baz'))))
#>   a        c
#> 1 1 foo, bar
#> 2 1      baz

# can handle non-encapsulated nested lists
rbind_list(list(list(a = 1, c = list('foo', 'bar')), list(a = 1, c = list('baz'))))
#>   a        c
#> 1 1 foo, bar
#> 2 1      baz

# ...which confuses dplyr
dplyr::bind_rows(list(list(a = 1, c = list('foo', 'bar')), list(a = 1, c = list('baz'))))
#> Error in bind_rows_(x, .id): Argument 2 must be length 1, not 2

# ...but fills missing list elements with NA because it doesn't track classes across observations
rbind_list(list(list(a = 1), list(c = list('baz'))))
#>    a   c
#> 1  1  NA
#> 2 NA baz

# ...which dplyr handles better
dplyr::bind_rows(list(list(a = 1), list(c = list('baz'))))
#> # A tibble: 2 x 2
#>       a c        
#>   <dbl> <list>   
#> 1  1.00 <NULL>   
#> 2 NA    <chr [1]>

While certainly more robust than do.call(rbind, ...), at scale this approach is likely to be considerably slower than package implementations written in C or C++.

like image 134
alistaire Avatar answered Mar 09 '23 00:03

alistaire