Given is a list with several element, the goal is to get them into a data frame. The map_df
function from the purr package is highly useful with regular lists, but gives an error with irregular lists.
For instance, following this tutorial the following works:
library(purrr)
library(repurrrsive) # The data comes from this package
map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))
A tibble: 30 x 6
name culture gender id born alive
<chr> <chr> <chr> <int> <chr> <lgl>
1 Theon Greyjoy Ironborn Male 1022 In 278 AC or 279 AC, at Pyke TRUE
2 Tyrion Lannister "" Male 1052 In 273 AC, at Casterly Rock TRUE
3 Victarion Greyjoy Ironborn Male 1074 In 268 AC or before, at Pyke TRUE
4 Will "" Male 1109 "" FALSE
5 Areo Hotah Norvoshi Male 1166 In 257 AC or before, at Norvos TRUE
6 Chett "" Male 1267 At Hag's Mire FALSE
7 Cressen "" Male 1295 In 219 AC or 220 AC FALSE
8 Arianne Martell Dornish Female 130 In 276 AC, at Sunspear TRUE
9 Daenerys Targaryen Valyrian Female 1303 In 284 AC, at Dragonstone TRUE
10 Davos Seaworth Westeros Male 1319 In 260 AC or before, at King's Landing TRUE
# … with 20 more rows
However, if an element is removed from the list, the function fails.
got_chars[[1]]["gender"]<-NULL
map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))
#Error: Argument 3 is a list, must contain atomic vectors
The desired output would be an NA
value for the missing element. What would an elegant solution be? I suspect the solution includes using purrr:possibly()
, but I haven't figured it out yet.
The devel version of tidyr has powerful new "unnesting" functions and they can handle this problematic data (Option 1). Another approach to this is to attack the problem column-wise, which lets you use the .default
argument to purrr::map()
, which provides a value to use for missing elements (Option 2).
library(tidyverse) # purrr, tidyr, and dplyr
library(repurrrsive) # The data comes from this package
got_chars_mutilated <- got_chars
got_chars_mutilated[[1]]["gender"] <- NULL
# original problem
map_dfr(
got_chars_mutilated,
magrittr::extract,
c("name", "culture", "gender", "id", "born", "alive")
)
#> Error: Argument 3 is a list, must contain atomic vectors
# Option 1:
# expanded unnest_*() functions coming soon in tidyr
packageVersion("tidyr")
#> [1] '0.8.99.9000'
# automatic unnesting leads to ... unnest_wider()
tibble(got = got_chars_mutilated) %>%
unnest_auto(got)
#> Using `unnest_wider(got)`; elements have {n_common} names in common
#> # A tibble: 30 x 18
#> url id name culture born died alive titles aliases father mother
#> <chr> <int> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr> <chr>
#> 1 http… 1022 Theo… Ironbo… In 2… "" TRUE <chr … <chr [… "" ""
#> 2 http… 1052 Tyri… "" In 2… "" TRUE <chr … <chr [… "" ""
#> 3 http… 1074 Vict… Ironbo… In 2… "" TRUE <chr … <chr [… "" ""
#> 4 http… 1109 Will "" "" In 2… FALSE <chr … <chr [… "" ""
#> 5 http… 1166 Areo… Norvos… In 2… "" TRUE <chr … <chr [… "" ""
#> 6 http… 1267 Chett "" At H… In 2… FALSE <chr … <chr [… "" ""
#> 7 http… 1295 Cres… "" In 2… In 2… FALSE <chr … <chr [… "" ""
#> 8 http… 130 Aria… Dornish In 2… "" TRUE <chr … <chr [… "" ""
#> 9 http… 1303 Daen… Valyri… In 2… "" TRUE <chr … <chr [… "" ""
#> 10 http… 1319 Davo… Wester… In 2… "" TRUE <chr … <chr [… "" ""
#> # … with 20 more rows, and 7 more variables: spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>, gender <chr>
# let's do it again, calling the proper function, and inspect `gender`
tibble(got = got_chars_mutilated) %>%
unnest_wider(got) %>%
pull(gender)
#> [1] NA "Male" "Male" "Male" "Male" "Male" "Male"
#> [8] "Female" "Female" "Male" "Female" "Male" "Female" "Male"
#> [15] "Male" "Male" "Female" "Female" "Female" "Male" "Male"
#> [22] "Male" "Male" "Male" "Male" "Female" "Male" "Male"
#> [29] "Male" "Female"
# Option 2:
# attack this column-wise
# mapping the names gives access to the `.default` argument for missing elements
c("name", "culture", "gender", "id", "born", "alive") %>%
set_names() %>%
map(~ map(got_chars_mutilated, .x, .default = NA)) %>%
map(simplify) %>%
as_tibble()
#> # A tibble: 30 x 6
#> name culture gender id born alive
#> <chr> <chr> <list> <int> <chr> <lgl>
#> 1 Theon Greyjoy Ironborn <lgl [1… 1022 In 278 AC or 279 AC, at Py… TRUE
#> 2 Tyrion Lannis… "" <chr [1… 1052 In 273 AC, at Casterly Rock TRUE
#> 3 Victarion Gre… Ironborn <chr [1… 1074 In 268 AC or before, at Py… TRUE
#> 4 Will "" <chr [1… 1109 "" FALSE
#> 5 Areo Hotah Norvoshi <chr [1… 1166 In 257 AC or before, at No… TRUE
#> 6 Chett "" <chr [1… 1267 At Hag's Mire FALSE
#> 7 Cressen "" <chr [1… 1295 In 219 AC or 220 AC FALSE
#> 8 Arianne Marte… Dornish <chr [1… 130 In 276 AC, at Sunspear TRUE
#> 9 Daenerys Targ… Valyrian <chr [1… 1303 In 284 AC, at Dragonstone TRUE
#> 10 Davos Seaworth Westeros <chr [1… 1319 In 260 AC or before, at Ki… TRUE
#> # … with 20 more rows
Created on 2019-08-15 by the reprex package (v0.3.0.9000)
One way is to define a partial()
ly-specified pluck()
that extracts a name of interest, returning NA
if it's missing. Pass the modified pluck()
to a double-map, with the inner map traversing the names to extract and the outer map traversing your got_chars
list:
v <- set_names(c("name", "culture", "gender", "id", "born", "alive"))
map_dfr( got_chars, ~map(v, partial(pluck, .x, .default=NA)) )
# # A tibble: 30 x 6
# name culture gender id born alive
# <chr> <chr> <chr> <int> <chr> <lgl>
# 1 Theon Greyjoy Ironborn NA 1022 In 278 AC or 279 AC, at Pyke TRUE
# 2 Tyrion Lannister "" Male 1052 In 273 AC, at Casterly Rock TRUE
# 3 Victarion Greyj… Ironborn Male 1074 In 268 AC or before, at Pyke TRUE
# 4 Will "" Male 1109 "" FALSE
# 5 Areo Hotah Norvoshi Male 1166 In 257 AC or before, at Norvos TRUE
# 6 Chett "" Male 1267 At Hag's Mire FALSE
# 7 Cressen "" Male 1295 In 219 AC or 220 AC FALSE
# 8 Arianne Martell Dornish Female 130 In 276 AC, at Sunspear TRUE
# 9 Daenerys Targar… Valyrian Female 1303 In 284 AC, at Dragonstone TRUE
# 10 Davos Seaworth Westeros Male 1319 In 260 AC or before, at King's … TRUE
# # … with 20 more rows
To clarify, .x
iterates over got_chars
because it lives inside a lambda function specified with ~
, so it corresponds to the outer map
. The function for the inner map
is specified with partial()
, which attaches the currently looked-at got_chars
element (i.e., the .x
) as the first argument to pluck()
. The modified pluck()
then accepts the name to extract as its (new) first argument, so it can be passed to the inner map as-is, without any extra ~
needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With