Consider the following list:
x <- list("a" = list("b", "c"),
"d" = list("e", "f" = list("g", "h")),
"i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))
It is worth noting that:
Given x, my aim is to output the data frame:
y <- data.frame(
main_level = c(rep("a", 2), rep("d", 3), rep("i", 4)),
level1 = c("b", "c", "e", rep("f", 2), "j", rep("k", 3)),
level2 = c(NA, NA, NA, "g", "h", NA, "l", "l", "l"),
level3 = c(NA, NA, NA, NA, NA, NA, "m", "n", "n"),
level4 = c(NA, NA, NA, NA, NA, NA, NA, "o", "p")
)
> y
main_level level1 level2 level3 level4
1 a b <NA> <NA> <NA>
2 a c <NA> <NA> <NA>
3 d e <NA> <NA> <NA>
4 d f g <NA> <NA>
5 d f h <NA> <NA>
6 i j <NA> <NA> <NA>
7 i k l m <NA>
8 i k l n o
9 i k l n p
NOTE that a typo was corrected in y above.
The above implies that there will be a variable number of columns as well, depending on the depth of the nesting.
Solutions online that I've found, when it comes to nested lists, assume that the list naming structure is more or less consistent, which is of course not the case here; or that the list depth is identical. For instance, the solutions at How to convert a nested lists to dataframe in R? and Converting nested list to dataframe do not apply because they are much more consistent in their naming.
Here's a way mainly relying on rrapply:
rrapply::rrapply(x, how = "melt") |>
apply(1, function(row){
newrow <- row[grep("[A-Za-z]", row)]
length(newrow) <- purrr::vec_depth(x) - 1
newrow
}) |>
t() |> as.data.frame() |>
`colnames<-`(c("main_level", paste0("level", 1:4)))
output
main_level level1 level2 level3 level4
1 a b <NA> <NA> <NA>
2 a c <NA> <NA> <NA>
3 d e <NA> <NA> <NA>
4 d f g <NA> <NA>
5 d f h <NA> <NA>
6 i j <NA> <NA> <NA>
7 i k l m <NA>
8 i k l n o
9 i k l n p
Note that so far it is quite crude. There might be a better way to reshape the output of rrapply. For instance, row[grep("[A-Za-z]", row)] may not work every time. I have also not tested whether length(newrow) <- purrr::vec_depth(x) - 1 is a good way of guessing the length, but it works here.
Here is a recursive function that has no assumptions other than the structure you described:
list_to_df <- function(l) {
leaves <- list()
go_deeper <- function(l, index=1, path=NULL) {
# we can still go deeper
if (is.list(l[[index]])) {
path <- c(path, names(l)[index])
l <- l[[index]]
lapply(seq_along(l), function(i) go_deeper(l, i, path))
# this is the final node (leaf)
} else {
leaves <<- c(leaves, list(c(path, l[[index]])))
}
}
# this saves the paths to each last node (leaf) in 'leaves' as a side effect
go_deeper(list(l))
# now just make a data frame from the 'leaves' list
len.max <- max(lengths(leaves))
leaves <- sapply(leaves, function(x) c(x, rep(NA, len.max-length(x))))
leaves <- as.data.frame(t(leaves))
names(leaves) <- c('main_level', paste0('level', seq_len(ncol(leaves)-1)))
leaves
}
list_to_df(x)
# main_level level1 level2 level3 level4
# 1 a b <NA> <NA> <NA>
# 2 a c <NA> <NA> <NA>
# 3 d e <NA> <NA> <NA>
# 4 d f g <NA> <NA>
# 5 d f h <NA> <NA>
# 6 i j <NA> <NA> <NA>
# 7 i k l m <NA>
# 8 i k l n o
# 9 i k l n p
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With