Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

transform polygon json coordinates into a data.frame

I want to transform a dataframe into another dataframe. If possible, in less commands, using dplyr or tidyr would be great.

In order to parse the coordinates list I used library(rjson), this part is OK, but I cannot manipulate the list further to get my result.

Should you can avoid using any for statement would be great, but any solution is good as long as it solve the problem :)

Input:

df <- data.frame(code = c("12000", "89000"),
                 polygon = c("[[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]",
                             "[[[81,82], [83,84], [85,86]]]"))
df

> df
   code                                                     polygon
1 12000 [[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]
2 89000                               [[[81,82], [83,84], [85,86]]]

Input data description:

  • column code contains postal code
  • column polygon contains one or more polygons defined by their latitude-longitude pairs of points

Output wanted:

> wanted
       a lon lat id
1  12000  11  12  1
2  12000  13  14  1
3  12000  15  16  1
4  12000  21  22  2
5  12000  23  24  2
6  12000  25  26  2
7  89000  81  82  1
8  89000  83  84  1
9  89000  85  86  1

I want to plot the wanted data.frame using ggplot.

like image 315
Costin Avatar asked Aug 29 '16 13:08

Costin


3 Answers

purrr, dplyr and jsonlite solution:

df <- data.frame(code = c("12000", "89000"),
                 polygon = c("[[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]",
                             "[[[81,82], [83,84], [85,86]]]"),
                 stringsAsFactors=FALSE)

library(purrr)
library(dplyr)
library(jsonlite)

make_coords <- function(x) {
  fromJSON(x$polygon, simplifyMatrix=FALSE) %>% 
  map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id")
} 

group_by(df, a=code) %>% 
  do(make_coords(.)) %>%
  ungroup() %>% 
  select(a, lat, lon, id)
## # A tibble: 9 x 4
##       a   lat   lon    id
##   <chr> <int> <int> <chr>
## 1 12000    11    12     1
## 2 12000    13    14     1
## 3 12000    15    16     1
## 4 12000    21    22     2
## 5 12000    23    24     2
## 6 12000    25    26     2
## 7 89000    81    82     1
## 8 89000    83    84     1
## 9 89000    85    86     1

This has the added benefit of validating the polygon data since your example ha[ds] invalid JSON (I had to edit out the final ] in the initial example).

NOTES:

  1. The group_by could be replaced by dplyr::rowwise or (with some other code changes) by purrr::by_row
  2. The idiom is to iterate through each code, convert the JSON into a list of coordinates, iterate through that list and make a date frame out of each polygon, and assigning the positional ID to it.
  3. The column names you want are assigned in three places: the initial group_by (to turn code into a), the innermost map_df (for lat & lon) and finally id which is auto-created by the outermost map_df.

rowwise version:

make_coords2 <- function(x) {
  fromJSON(x$polygon, simplifyMatrix=FALSE) %>% 
    map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id") %>% 
    mutate(a=x$a)
}

select(df, a=code, polygon) %>% 
  rowwise() %>% 
  do(make_coords2(.)) %>%
  ungroup() %>% 
  select(a, lat, lon, id)

by_row version:

make_coords3 <- function(x) {
  fromJSON(x$polygon, simplifyMatrix=FALSE) %>% 
    map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id")
}

select(df, a=code, polygon) %>% 
  by_row(make_coords3, .collate="rows") %>% 
  select(a, lat, lon, id)
like image 198
hrbrmstr Avatar answered Oct 16 '22 05:10

hrbrmstr


This isn't pretty, but some calls to strsplit, gsub, and unnest can do quite a bit:

  • Splitting by ]], allows us separate several polygons.
  • We then spread those over separate rows.
  • Create the id column can easily be done with row_number within each code
  • Split again, on ], to separate the point pairs.
  • Put on seperate rows again.
  • Remove all [ and ].
  • Separate on , to separate lon and lat.
  • Put those in separate columns.

.

df %>% 
  mutate(polygon = strsplit(polygon, ']],')) %>% 
  unnest() %>% 
  group_by(code) %>% 
  mutate(id = row_number(),
         polygon = strsplit(polygon, '],')) %>% 
  unnest() %>% 
  mutate(polygon = gsub(']|\\[', '', polygon),
         polygon = strsplit(polygon, ','),
         lon = sapply(polygon, '[', 1),
         lat = sapply(polygon, '[', 2)) %>% 
  select(-polygon)
Source: local data frame [9 x 4]
Groups: code [2]

   code    id   lon   lat
  <chr> <int> <chr> <chr>
1 12000     1    11    12
2 12000     1    13    14
3 12000     1    15    16
4 12000     2    21    22
5 12000     2    23    24
6 12000     2    25    26
7 89000     1    81    82
8 89000     1    83    84
9 89000     1    85    86
like image 34
Axeman Avatar answered Oct 16 '22 04:10

Axeman


I think there is a closing bracket too much in df$polygon[2]. If that is removed, you could do the following:

require(jsonlite)
require(reshape2)
parse_json <- function(polygon, code){
  molten <- melt(fromJSON(polygon))
  lat <- molten[which(molten$Var3==1), "value"]
  lon <- molten[which(molten$Var3==2), "value"]
  id <- molten[which(molten$Var3==1), "Var1"]
  data.frame(code, lat, lon, id)
}

dat_raw <- mapply(parse_json, df$polygon, df$code, SIMPLIFY = FALSE, USE.NAMES = FALSE)
do.call(rbind, dat_raw)

Which gives you:

   code lat lon id
1 12000  11  12  1
2 12000  21  22  2
3 12000  13  14  1
4 12000  23  24  2
5 12000  15  16  1
6 12000  25  26  2
7 89000  81  82  1
8 89000  83  84  1
9 89000  85  86  1
like image 32
Rentrop Avatar answered Oct 16 '22 03:10

Rentrop