I have been playing with nested lists lately and being able to extract data from deep levels. I run into a little problem with tidyr
function hoist()
. I am able to pull latitude and longitude for 5 and 7 addresses but with two separate commands. I was wondering if it possible to access the list structure with hoist()
in such a way that extracting lat
and lng
takes only one command. This is the example:
library(tidyr)
library(dplyr)
library(repurrrsive)
gmaps_cities_o <- repurrrsive::gmaps_cities
gmaps_cities_o
with output:
A tibble:5 × 2
city json
<chr> <list>
Houston <list [2]>
Washington <list [2]>
New York <list [2]>
Chicago <list [2]>
Arlington <list [2]>
5 rows
To extract lat
and lng
I have to write two pieces of code:
# extract lat, long for the first address
gmaps_cities_o %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
output:
A tibble:5 × 4
city lat lng json
<chr> <dbl> <dbl> <list>
Houston 29.76043 -95.36980 <list [2]>
Washington 47.75107 -120.74014 <list [2]>
New York 40.71278 -74.00597 <list [2]>
Chicago 41.87811 -87.62980 <list [2]>
Arlington 32.73569 -97.10807 <list [2]>
5 rows
And for the second address:
# extract lat, long for the second address
gmaps_cities_o %>%
hoist(json,
lat = list("results", 2, "geometry", "location", "lat"),
lng = list("results", 2, "geometry", "location", "lng")
)
with output:
A tibble:5 × 4
city lat lng json
<chr> <dbl> <dbl> <list>
Houston NA NA <list [2]>
Washington 38.90719 -77.03687 <list [2]>
New York NA NA <list [2]>
Chicago NA NA <list [2]>
Arlington 38.87997 -77.10677 <list [2]>
5 rows
So, two separate operations to obtain lat
and lng
for 7 addresses in five cities.
I could extract lat
and lng
with this piece of code:
gmaps_cities_o %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
hoist(results,
lat = list("geometry", "location", "lat"),
lng = list("geometry", "location", "lng")
) %>%
select(city, lat, lng)
with output:
A tibble:7 × 3
city lat lng
<chr> <dbl> <dbl>
Houston 29.76043 -95.36980
Washington 47.75107 -120.74014
Washington 38.90719 -77.03687
New York 40.71278 -74.00597
Chicago 41.87811 -87.62980
Arlington 32.73569 -97.10807
Arlington 38.87997 -77.10677
7 rows
But it just doesn't seem right that I cannot do it with hoist()
in one operation. Something like this:
gmaps_cities_o %>%
hoist(json,
lat = list("results", (?), "geometry", "location", "lat"),
lng = list("results", (?), "geometry", "location", "lng")
)
Would anybody with experience in nested lists would give me a hint?
This is inspired by the nice answer by I_O but is enough of a deviation that it's probably a separate answer. You can create a function, my_hoist
:
my_hoist <- function(x, path) {
x_flat <- unlist(x)
x_flat[grepl(paste(path, collapse = "\\."), names(x_flat))]
}
This can be used in a similar fashion to hoist but without specifying the index:
gmaps_cities_o |>
group_by(city) |>
reframe(
lat = my_hoist(json, c("results", "geometry", "location", "lat")),
lng = my_hoist(json, c("results", "geometry", "location", "lng")),
)
# # A tibble: 7 × 3
# city lat lng
# <chr> <chr> <chr>
# 1 Arlington 32.735687 -97.1080656
# 2 Arlington 38.8799697 -77.1067698
# 3 Chicago 41.8781136 -87.6297982
# 4 Houston 29.7604267 -95.3698028
# 5 New York 40.7127753 -74.0059728
# 6 Washington 47.7510741 -120.7401386
# 7 Washington 38.9071923 -77.0368707
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With