I will put dput
of what my list looks like at the bottom such that the q can be reproducible. The dput is of a
not x
.
I have a big nested list called x
that I'm trying to build a data frame from but cannot figure it out.
I have done the first part:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
zz <- as.data.frame(do.call(rbind,lapply(a, `length<-`, max(indx))))}
For this I used the following answer: Converting nested list (unequal length) to data frame
This leaves me a data.frame with n columns for results where n is the max results for any i:
v1 v2 v3
1 NULL NULL NULL
2 * * *
3 NULL NULL NULL
Each * is another nested list in the format list(experience = list(duration = ...
For example the first *
in row 2, column v1. I don't want the total list. I only want:
a[[2]][[1]]$experience$start
or in terms of the original list x:
x$results[[2]]$experiences[[1]]$experience$start
I feel like I'm nearly there with some tweaks. I tried:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
for(y in 1:length(a[[i]])) aa <- rbind(aa,tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(NA)))
zz <- as.data.frame(do.call(rbind,lapply(aa, `length<-`, max(indx))))}
Resulting in:
v1 v2 v3
1 NA NA NA
2 NA NA NA
3 2014 NA NA
4 2012 NA NA
5 2006 NA NA
6 NA NA NA
7 NA NA NA
Tried cbind instead of rbind on final line and that put all the dates in the first row.
I also tried the following:
for(i in 1:3){a[[i]]<-lengths(x$results[[i]]$experiences)
indx <- lengths(a)
for(y in 1:length(indx)){tt[i] <- tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(""))}
zz <- as.data.frame(do.call(rbind,lapply(tt, `length<-`, max(indx))))}
This came close, builds the right format but only returns the first result:
v1 v2 v3
1 NA NA NA
2 2014 NA NA
3 NA NA NA
The format I want is:
V1 V2 V3
1 NA NA NA
2 2014 2012 2006
3 NA NA NA
((Sample data now at bottom))
Newest attempt:
Doing the following but returns only the first start date from each a[[i]]
, the second loop I need to make the list aa[i][y]
something different.
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
for(y in 1:length(a[[i]])){aa[i][y] = if(is.null(a[[i]][[y]]$experience$start)){"NULL"}else{a[[i]][[y]]$experience$start}}}
So for dput2
I'd like the form:
v1 v2 v3 v4 v5 v6 v7 v8
1 2015
2 2011 2007 null null null null null null
3 2016 2015 2015 2015 2013 2010
I dont mind if the blanks are null or na
UPDATE
The below answer almost works, however in my data the structure changes, the order of the names (roleName, duration etc) change so that ruins the answer as cumsum
is used to determine when a new list is found. If you have duration
then start
the keys are 9
and 1
and the cumsum
part labels them two different lists.
I wrote the following:
my.list <- list(structure(
list(
experience = structure(
list(
start = "1",
end = "1",
roleName = "a",
summary = "a",
duration = "a",
current = "a",
org = structure(list(name = "a", url = "a"), .Names = c("name","url")),
location = structure(
list(
displayLocation = NULL,
lat = NULL,
lng = NULL
),
.Names = c("displayLocation",
"lat", "lng")
) ),.Names = c("start", "end", "roleName", "summary", "duration", "current", "org", "location")),
`_meta` = structure(
list(weight = 1L, `_sources` = list(structure(
list(`_origin` = "a"), .Names = "_origin"
))),.Names = c("weight", "_sources"))),.Names = c("experience", "_meta")))
Then:
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
This changes the structure such that key2
will always be in the right order.
However Then I found after this loop I have another issue.
Sometimes I have for example nothing but a roleName in the experience list. If that occurs twice in a row the keys are repeated. cumsum
treats them as the same experience instead of separate ones.
This means I cannot create df3
because of duplicate identifiers for rows. And even if I could by removing troublesome rows, the names wouldn't match as i
in the solution below matches the names using the sequence, if I remove any rows that changes the lengths.
Here is my total code for more insight:
for(i in 1:x$count){a[[i]]<-x$results[[i]]$experiences}
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
aaa <- unlist(aa)
dummydf <- data.frame(b=c("start", "end", "roleName", "summary",
"duration", "current", "org.name", "org.url"), key=1:8)
df <- data.frame(a=aaa, b=names(aaa))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(aa), function(y) rep(y, sapply(aa, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
df4 <- data.frame(t(df3))
df4$dates <- as.Date(NA)
df4$dates <- as.Date(df4$start)
df4 <- data.frame(dates = df4$dates)
df4 <- t(df4)
df4 <- data.frame(df4)
names(df4) <- paste0(names(df4), "_", i)
df4[] <- lapply(df4[], as.character)
l1 <- lapply(split(stack(df4), sub('.*_', '', stack(df4)[,2])), '[', 1)
df5 <- t(do.call(cbindPad, l1))
df5 <- data.frame(df5)
cbindpad
taken from this question
New sample code including the issues:
dput3 =
list(list(), list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(end = "3000",
start = "2012", duration = "2",
roleName = "a", summary = "aaa",
org = structure(list(name = "None"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(duration = "3",
start = "2006", end = "3000",
roleName = "a", summary = "aaa", org = structure(list(name = " "), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta"))
),
list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta"))))
Maybe this can help
library(dplyr)
library(tidyr)
a <- unlist(a)
df <- data.frame(a=a, b=names(a)) %>% mutate(key=cumsum(b=="experience.duration")) %>%
split(.$key) %>% lapply(function(x) x %>% select(-key) %>% spread(b, a)) %>%
do.call(rbind, .) %>% t %>% data.frame
df$key <- rownames(df)
Then you can filter in on the rows of interest
The above would be equivalent to
rbind(unlist(a)[1:8], unlist(a)[9:16],unlist(a)[17:24]) %>% t
try this for dput2
a <- unlist(dput2)
library(dplyr)
library(tidyr)
dummydf <- data.frame(b=c("experience.start", "experience.end", "experience.roleName", "experience.summary",
"experience.org", "experience.org.name", "experience.org.url",
"_meta.weight", "_meta._sources._origin", "experience.duration"), key=1:10)
df <- data.frame(a=a, b=names(a))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(dput2), function(y) rep(y, sapply(dput2, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
View(df3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With