Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you hoist latitude and longitude from a dataset in one stroke

I have been playing with nested lists lately and being able to extract data from deep levels. I run into a little problem with tidyr function hoist(). I am able to pull latitude and longitude for 5 and 7 addresses but with two separate commands. I was wondering if it possible to access the list structure with hoist() in such a way that extracting lat and lng takes only one command. This is the example:

library(tidyr)
library(dplyr)
library(repurrrsive)

gmaps_cities_o <- repurrrsive::gmaps_cities
gmaps_cities_o

with output:

A tibble:5 × 2
  city         json
  <chr>       <list>
 Houston     <list [2]>         
 Washington  <list [2]>         
 New York    <list [2]>         
 Chicago     <list [2]>         
 Arlington   <list [2]>         
5 rows

To extract lat and lng I have to write two pieces of code:

# extract lat, long for the first address
gmaps_cities_o %>% 
    hoist(json, 
           lat = list("results", 1, "geometry", "location", "lat"),
           lng = list("results", 1, "geometry", "location", "lng")
           )

output:

A tibble:5 × 4
 city        lat          lng         json
 <chr>       <dbl>        <dbl>      <list>
Houston     29.76043    -95.36980   <list [2]>  
Washington  47.75107    -120.74014  <list [2]>  
New York    40.71278    -74.00597   <list [2]>  
Chicago     41.87811    -87.62980   <list [2]>  
Arlington   32.73569    -97.10807   <list [2]>  
5 rows

And for the second address:

# extract lat, long for the second address
gmaps_cities_o %>% 
    hoist(json, 
           lat = list("results", 2, "geometry", "location", "lat"),
           lng = list("results", 2, "geometry", "location", "lng")
           )

with output:

A tibble:5 × 4
 city          lat          lng        json
 <chr>        <dbl>        <dbl>      <list>
Houston           NA           NA   <list [2]>  
Washington  38.90719    -77.03687   <list [2]>  
New York          NA           NA   <list [2]>  
Chicago           NA           NA   <list [2]>  
Arlington   38.87997    -77.10677   <list [2]>  
5 rows

So, two separate operations to obtain lat and lng for 7 addresses in five cities.

I could extract lat and lng with this piece of code:

gmaps_cities_o %>% 
    unnest_wider(json) %>% 
    unnest_longer(results) %>% 
    hoist(results,
          lat = list("geometry", "location", "lat"),
          lng = list("geometry", "location", "lng")
          ) %>% 
    select(city, lat, lng)

with output:

A tibble:7 × 3
 city         lat          lng
 <chr>       <dbl>        <dbl>
Houston     29.76043    -95.36980       
Washington  47.75107   -120.74014       
Washington  38.90719    -77.03687       
New York    40.71278    -74.00597       
Chicago     41.87811    -87.62980       
Arlington   32.73569    -97.10807       
Arlington   38.87997    -77.10677       
7 rows

But it just doesn't seem right that I cannot do it with hoist() in one operation. Something like this:

gmaps_cities_o %>% 
    hoist(json, 
           lat = list("results", (?), "geometry", "location", "lat"),
           lng = list("results", (?), "geometry", "location", "lng")
           )

Would anybody with experience in nested lists would give me a hint?

like image 870
f0nzie Avatar asked Sep 05 '25 03:09

f0nzie


1 Answers

This is inspired by the nice answer by I_O but is enough of a deviation that it's probably a separate answer. You can create a function, my_hoist:

my_hoist <- function(x, path) {
    x_flat <- unlist(x)
    x_flat[grepl(paste(path, collapse = "\\."), names(x_flat))]
}

This can be used in a similar fashion to hoist but without specifying the index:

gmaps_cities_o |>
    group_by(city) |>
    reframe(
        lat = my_hoist(json, c("results", "geometry", "location", "lat")),
        lng = my_hoist(json, c("results", "geometry", "location", "lng")),
    )

# # A tibble: 7 × 3
#   city       lat        lng         
#   <chr>      <chr>      <chr>       
# 1 Arlington  32.735687  -97.1080656 
# 2 Arlington  38.8799697 -77.1067698 
# 3 Chicago    41.8781136 -87.6297982 
# 4 Houston    29.7604267 -95.3698028 
# 5 New York   40.7127753 -74.0059728 
# 6 Washington 47.7510741 -120.7401386
# 7 Washington 38.9071923 -77.0368707 
like image 85
SamR Avatar answered Sep 07 '25 21:09

SamR