Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert nested list with different names to data.frame filling NA and adding column

I need a base R solution to convert nested list with different names to a data.frame

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z=list('k'))

convert(mylist)
## returns a data.frame:
##
##     a     b    z           
##     1     2    <NULL>   
##     3    NA    <NULL>   
##    NA     5    <NULL>   
##     9    NA    <chr [1]>

I know this could be easily done with dplyr::bind_rows or data.table::rbindlist with fill = TRUE (not ideal though since it fills character column with NULL, not NA), but I do really need a solution in base R. To simplify the problem, it is also fine with a 2-level nested list that has no 3rd level lists such as

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z='k'))

convert(mylist)
## returns a data.frame:
##
##     a     b    z           
##     1     2    NA   
##     3    NA    NA   
##    NA     5    NA   
##     9    NA    k  

I have tried something like

convert <- function(L) as.data.frame(do.call(rbind, L))

This does not fill NA and add additional column z

Update

mylist here is just a simplified example. In reality I could not assume the names of the sublist elements (a, b and z in the example), nor the sublists lengths (2, 1, 1, 2 in the example).

Here are the assumptions for expected data.frame and the input mylist:

  1. The column number of the expected data.frame is determined by the maximum length of the sublists which could vary from 1 to several hundreds. There is no explicit source of information about the length of each sublist (which names will appear or disappear in which sublist is unknown) max(sapply(mylist, length)) <= 1000 ## ==> TRUE
  2. The row number of the expected data.frame is determined by the length of mylist which could vary from 1 to several thousands dplyr::between(length(mylist), 0, 10000) ## ==> TRUE
  3. No explicit information for the names of the sublist elements and their orders, therefore the column names and order of the expected data.frame can only be determined intrinsically from mylist
  4. Each sublist contains elements in types of numeric, character or list. To simplify the problem, consider only numeric and character.
like image 554
englealuze Avatar asked Dec 19 '25 02:12

englealuze


2 Answers

A shorter solution in base R would be

make_df <- function(a = NA, b = NA, z = NA) {
  data.frame(a = unlist(a), b = unlist(b), z = unlist(z))
}

do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
#>    a  b    z
#> 1  1  2 <NA>
#> 2  3 NA <NA>
#> 3 NA  5 <NA>
#> 4  9 NA    k

Update

A more general solution using the same method, but which does not require specific names would be:

build_data_frame <- function(obj) {
  nms     <- unique(unlist(lapply(obj, names)))
  frmls   <- as.list(setNames(rep(NA, length(nms)), nms))
  dflst   <- setNames(lapply(nms, function(x) call("unlist", as.symbol(x))), nms)
  make_df <- as.function(c(frmls, call("do.call", "data.frame", dflst)))
  
  do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
}

This allows

build_data_frame(mylist)
#>    a  b    z
#> 1  1  2 <NA>
#> 2  3 NA <NA>
#> 3 NA  5 <NA>
#> 4  9 NA    k
like image 116
Allan Cameron Avatar answered Dec 20 '25 16:12

Allan Cameron


We can try the base R code below

subset(
    Reduce(
        function(...) {
            merge(..., all = TRUE)
        },
        Map(
            function(k, x) cbind(id = k, list2DF(x)),
            seq_along(mylist), mylist
        )
    ),
    select = -id
)

which gives

   a  b  z
1  1  2 NA
2  3 NA NA
3 NA  5 NA
4  9 NA  k
like image 20
ThomasIsCoding Avatar answered Dec 20 '25 15:12

ThomasIsCoding