Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Flatten nested lists of different lengths (Google geocode API output) in R

I have been using the Geocoding API from Google to geocode address lists. It returns results in nested lists. The elements in the lists might vary, and sometimes there are partial matches, resulting in multiple nested lists nested at the highest level. So far, I have saved each GoogleResult into a single data frame cell.

Here is an example of my dataframe:

    df <- structure(list(address = structure(c(3L, 1L, 2L), .Label = c("115 Civic Parade, Altona VIC 3018", 
"Civic Parade, Altona VIC 3018", "EAST LA CLARKEFIELD 3430"), class = "factor"), 
    GoogleResult = list(list(list(access_points = list(), address_components = list(
        list(long_name = "Los Angeles", short_name = "Los Angeles", 
            types = list("locality", "political")), list(long_name = "Los Angeles County", 
            short_name = "Los Angeles County", types = list("administrative_area_level_2", 
                "political")), list(long_name = "California", 
            short_name = "CA", types = list("administrative_area_level_1", 
                "political")), list(long_name = "United States", 
            short_name = "US", types = list("country", "political"))), 
        formatted_address = "Los Angeles, CA, USA", geometry = list(
            bounds = list(northeast = list(lat = 34.3373061, 
                lng = -118.1552891), southwest = list(lat = 33.7036519, 
                lng = -118.6681759)), location = list(lat = 34.0522342, 
                lng = -118.2436849), location_type = "APPROXIMATE", 
            viewport = list(northeast = list(lat = 34.3373061, 
                lng = -118.1552891), southwest = list(lat = 33.7036519, 
                lng = -118.6681759))), partial_match = TRUE, 
        place_id = "ChIJE9on3F3HwoAR9AhGJW_fL-I", types = list(
            "locality", "political")), list(access_points = list(), 
        address_components = list(list(long_name = "3430", short_name = "3430", 
            types = list("postal_code")), list(long_name = "Clarkefield", 
            short_name = "Clarkefield", types = list("locality", 
                "political")), list(long_name = "Victoria", short_name = "VIC", 
            types = list("administrative_area_level_1", "political")), 
            list(long_name = "Australia", short_name = "AU", 
                types = list("country", "political"))), formatted_address = "Clarkefield VIC 3430, Australia", 
        geometry = list(bounds = list(northeast = list(lat = -37.4364578, 
            lng = 144.8986988), southwest = list(lat = -37.5280439, 
            lng = 144.7012193)), location = list(lat = -37.497542, 
            lng = 144.8071366), location_type = "APPROXIMATE", 
            viewport = list(northeast = list(lat = -37.4364578, 
                lng = 144.8986988), southwest = list(lat = -37.5280439, 
                lng = 144.7012193))), partial_match = TRUE, place_id = "ChIJS3IdP-xX1moRkD8uRnhWBBw", 
        types = list("postal_code"))), list(list(access_points = list(), 
        address_components = list(list(long_name = "115", short_name = "115", 
            types = list("street_number")), list(long_name = "Civic Parade", 
            short_name = "Civic Parade", types = list("route")), 
            list(long_name = "Altona", short_name = "Altona", 
                types = list("locality", "political")), list(
                long_name = "Hobsons Bay City", short_name = "Hobsons Bay", 
                types = list("administrative_area_level_2", "political")), 
            list(long_name = "Victoria", short_name = "VIC", 
                types = list("administrative_area_level_1", "political")), 
            list(long_name = "Australia", short_name = "AU", 
                types = list("country", "political")), list(long_name = "3018", 
                short_name = "3018", types = list("postal_code"))), 
        formatted_address = "115 Civic Parade, Altona VIC 3018, Australia", 
        geometry = list(bounds = list(northeast = list(lat = -37.8633208, 
            lng = 144.8316509), southwest = list(lat = -37.86409, 
            lng = 144.8303929)), location = list(lat = -37.863727, 
            lng = 144.8310159), location_type = "ROOFTOP", viewport = list(
            northeast = list(lat = -37.8623564197085, lng = 144.832370880292), 
            southwest = list(lat = -37.8650543802915, lng = 144.829672919709))), 
        place_id = "ChIJBXz75NRj1moRpVRt21nooQw", types = list(
            "premise"))), list(list(access_points = list(), address_components = list(
        list(long_name = "Civic Parade", short_name = "Civic Parade", 
            types = list("route")), list(long_name = "Altona", 
            short_name = "Altona", types = list("locality", "political")), 
        list(long_name = "Hobsons Bay City", short_name = "Hobsons Bay", 
            types = list("administrative_area_level_2", "political")), 
        list(long_name = "Victoria", short_name = "VIC", types = list(
            "administrative_area_level_1", "political")), list(
            long_name = "Australia", short_name = "AU", types = list(
                "country", "political")), list(long_name = "3018", 
            short_name = "3018", types = list("postal_code"))), 
        formatted_address = "Civic Parade, Altona VIC 3018, Australia", 
        geometry = list(bounds = list(northeast = list(lat = -37.8626502, 
            lng = 144.8449271), southwest = list(lat = -37.8661171, 
            lng = 144.81081)), location = list(lat = -37.864412, 
            lng = 144.8303004), location_type = "GEOMETRIC_CENTER", 
            viewport = list(northeast = list(lat = -37.8626502, 
                lng = 144.8449271), southwest = list(lat = -37.8661171, 
                lng = 144.81081))), place_id = "EihDaXZpYyBQYXJhZGUsIEFsdG9uYSBWSUMgMzAxOCwgQXVzdHJhbGlhIi4qLAoUChIJtbGXUCti1moRKcxHhdx2QrYSFAoSCSEyccGdYdZqEXDajCF1VgQF", 
        types = list("route"))))), row.names = c(NA, -3L), class = "data.frame")

The first case has a partial match, which two nested lists of results.

My expected output is:

  • a data frame with all elements of all lists as columns
  • all columns to be named appropriately
  • Partial matches have >1 results, which can either be 1 row per match, or just widen the dataframe with 'address2'-variables. Either way I can work with.

I tried things like:

lapply(df$GoogleResult, data.frame, stringsAsFactors = FALSE)

but elements differ in length...resulting in:

arguments imply differing number of rows: 0, 1

In case of partial matches, the results could be shown as two rows in the dataframe, or as an additional set of columns.

like image 362
Luc Avatar asked May 27 '20 00:05

Luc


1 Answers

can you try something like:

df %>% 
 unnest(col = GoogleResult) %>% unnest(col = GoogleResult)%>%
  filter(lengths(GoogleResult)>0)%>%
  {map2(.$GoogleResult,.$address,
        ~cbind(address = .y,data.frame(fromJSON(toJSON(.x))))%>%unnest())}%>%
  plyr::rbind.fill()
like image 74
KU99 Avatar answered Oct 05 '22 12:10

KU99