I have a list with the following example structure:
> dput(test)
structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(
var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2",
"var3")), section2 = structure(list(row = structure(list(var1 = 1,
var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")),
row = structure(list(var1 = 4, var2 = 5, var3 = 6), .Names = c("var1",
"var2", "var3")), row = structure(list(var1 = 7, var2 = 8,
var3 = 9), .Names = c("var1", "var2", "var3"))), .Names = c("row",
"row", "row"))), .Names = c("id", "var1", "var3", "section1",
"section2"))
> str(test)
List of 5
$ id : num 1
$ var1 : num 2
$ var3 : num 4
$ section1:List of 3
..$ var1: num 1
..$ var2: num 2
..$ var3: num 3
$ section2:List of 3
..$ row:List of 3
.. ..$ var1: num 1
.. ..$ var2: num 2
.. ..$ var3: num 3
..$ row:List of 3
.. ..$ var1: num 4
.. ..$ var2: num 5
.. ..$ var3: num 6
..$ row:List of 3
.. ..$ var1: num 7
.. ..$ var2: num 8
.. ..$ var3: num 9
Notice that the section2
list contains elements named rows
. These represent multiple records. What I have is a nested list where some elements are at the root level and others are multiple nested records for the same observation. I would like the following output in a data.frame
format:
> desired
id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
1 1 2 4 1 2 3 1 4 7
2 NA NA NA NA NA NA 2 5 8
3 NA NA NA NA NA NA 3 6 9
Root-level elements should populate the first row, while row
elements should have their own rows. As an added complication, the number of variables in the row
entries can vary.
Here's a general approach. It doesn't assume that you'll have only three row; it will work with however many rows you have. And if a value is missing in the nested structure (e.g. var1 doesn't exist for some sub-lists in section2), the code correctly returns an NA for that cell.
E.g. if we use the following data:
test <- structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), section2 = structure(list(row = structure(list(var1 = 1, var2 = 2), .Names = c("var1", "var2")), row = structure(list(var1 = 4, var2 = 5), .Names = c("var1", "var2")), row = structure(list( var2 = 8, var3 = 9), .Names = c("var2", "var3"))), .Names = c("row", "row", "row"))), .Names = c("id", "var1", "var3", "section1", "section2"))
The general approach is to use melt to create a dataframe that includes information about the nested structure, and then dcast to mold it into the format you desire.
library("reshape2")
flat <- unlist(test, recursive=FALSE)
names(flat)[grep("row", names(flat))] <- gsub("row", "var", paste0(names(flat)[grep("row", names(flat))], seq_len(length(names(flat)[grep("row", names(flat))])))) ## keeps track of rows by adding an ID
ul <- melt(unlist(flat))
split <- strsplit(rownames(ul), split=".", fixed=TRUE) ## splits the names into component parts
max <- max(unlist(lapply(split, FUN=length)))
pad <- function(a) {
c(a, rep(NA, max-length(a)))
}
levels <- matrix(unlist(lapply(split, FUN=pad)), ncol=max, byrow=TRUE)
## Get the nesting structure
nested <- data.frame(levels, ul)
nested$X3[is.na(nested$X3)] <- levels(as.factor(nested$X3))[[1]]
desired <- dcast(nested, X3~X1 + X2)
names(desired) <- gsub("_", "\\.", gsub("_NA", "", names(desired)))
desired <- desired[,names(flat)]
> desired
## id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
## 1 1 2 4 1 2 3 1 4 7
## 2 NA NA NA NA NA NA 2 5 8
## 3 NA NA NA NA NA NA 3 6 9
The central idea of this solution is to flatten all sub-lists except the sub-lists named 'row'. This could be done by creating a unique ID for each list element (stored in z
) and then requesting that all elements within a single 'row' should have the same ID (stored in z2
; had to write a recursive function to traverse the nested list). Then, z2
could be used to group elements that belong to the same row. The resulting list can be converted into the matrix form using stri_list2matrix
from the stringi
package, and then converted into a data frame.
utest <- unlist(test)
z <- relist(seq_along(utest),test)
recurse <- function(L) {
if (class(L)!='list') return(L)
b <- names(L)=='row'
L.b <- lapply(L[b],function(k) relist(rep(k[[1]],length(k)),k))
L.nb <- lapply(L[!b],recurse)
c(L.b,L.nb)
}
z2 <- unlist(recurse(z))
library(stringi)
desired <- as.data.frame(stri_list2matrix(split(utest,z2)))
names(desired) <- names(z2)[unique(z2)]
desired
# id var1 var3 section1.var1 section1.var2 section1.var3 section2.row.var1
# 1 1 2 4 1 2 3 1
# 2 <NA> <NA> <NA> <NA> <NA> <NA> 2
# 3 <NA> <NA> <NA> <NA> <NA> <NA> 3
# section2.row.var1 section2.row.var1
# 1 4 7
# 2 5 8
# 3 6 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With