Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neatest way to build a data frame from a list of lists in R

Tags:

list

r

I have a list of sub-lists that I wish to convert to a data frame (specifically as a tibble); for example:

myList <- list(
        list(var1=1,var2=2,var3=3,var4=4,var5=5,var6=6),
        list(var1=4,var2=5,var3=6,var4=7,var5=8,var6=9),
        list(var1=7,var2=8,var3=9,var4=1,var5=2,var6=3)
)

Using the following code, I can extract chosen variables to a tibble data frame

myDF <- tbl_df(cbind(
  var1 = lapply(myList, '[[', "var1"),
  var2 = lapply(myList, '[[', "var2"),
  var5 = lapply(myList, '[[', "var5"),
  var6 = lapply(myList, '[[', "var6")
))  

But it is quite verbose. Is there a more succinct way (perhaps using a purrr map function) that can pull chosen sub-elements out of each list and populate them into a row?

Further, if the sub-lists contain lists themselves, how best to extract elements of those lists; e.g:

 myList <- list(
        list(var1=1,var2=2,var3=3,list4=list(varA="a",varB="b")),
        list(var1=4,var2=5,var3=6,list4=list(varA="c",varB="d")),
        list(var1=7,var2=8,var3=9,list4=list(varA="e",varB="f"))
)    

How could I get something like the following to work:

myDF <- tbl_df(cbind(
  var1 = lapply(myList, '[[', "var1"),
  var2 = lapply(myList, '[[', "var2"),
  var4 = lapply(myList, '[[', "list4$varA")
)) 

Where I want to extract a specific element from list 4, but using $ notation to drill down to the next level does not work?

like image 308
Brisbane Pom Avatar asked Feb 04 '18 12:02

Brisbane Pom


2 Answers

Since data frames are just lists, if your list isnt nested more than once.

library(tidyverse)
myList %>%
  map(as.data.frame) %>%
  bind_rows() %>%
  select(var1, var2, var5, var6)

#    var1 var2 var5 var6
# 1    1    2    5    6
# 2    4    5    8    9
# 3    7    8    2    3

Or even the following, bind_rows() actually works on a list of lists.

myList %>%
  bind_rows() %>%
  select(var1, var2, var5, var6)

#    var1  var2  var5  var6
#    <dbl> <dbl> <dbl> <dbl>
# 1  1.00  2.00  5.00  6.00
# 2  4.00  5.00  8.00  9.00
# 3  7.00  8.00  2.00  3.00

However sometimes it may be the case where each list element has only some common elements and you want to select only those specifically

myList %>%
  map(as.data.frame) %>%
  map(~ select(.x, var1, var2, var5, var6)) %>%
  bind_rows()

#    var1 var2 var5 var6
# 1    1    2    5    6
# 2    4    5    8    9
# 3    7    8    2    3

For cases where the lists are nested more than once investigate using flatten() from purrr

myList2 <- list(
  list(var1=1,var2=2,var3=3,list4=list(varA="a",varB="b")),
  list(var1=4,var2=5,var3=6,list4=list(varA="c",varB="d")),
  list(var1=7,var2=8,var3=9,list4=list(varA="e",varB="f"))
)  

myList2 %>%
  map(flatten) %>%
  bind_rows()

#   var1  var2  var3 varA  varB 
#   <dbl> <dbl> <dbl> <chr> <chr>
# 1  1.00  2.00  3.00 a     b    
# 2  4.00  5.00  6.00 c     d    
# 3  7.00  8.00  9.00 e     f  

and apply select() as desired, the names will be the names of the respective elements. Be very careful with duplicate names in different elements as it will only take one.

There may be situations where the enframe() function from tibble is also useful.

like image 199
zacdav Avatar answered Sep 29 '22 19:09

zacdav


For the first case, a possible base-R solution:

> data.frame(do.call(rbind, myList))[c("var1", "var2", "var5", "var5")]
var1 var2 var5 var6
1    1    2    5    6
2    4    5    8    9
3    7    8    2    3
like image 22
Esteban PS Avatar answered Sep 29 '22 20:09

Esteban PS