I want to transform a dataframe into another dataframe. If possible, in less commands, using dplyr or tidyr would be great.
In order to parse the coordinates list I used library(rjson), this part is OK, but I cannot manipulate the list further to get my result.
Should you can avoid using any for statement would be great, but any solution is good as long as it solve the problem :)
Input:
df <- data.frame(code = c("12000", "89000"),
polygon = c("[[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]",
"[[[81,82], [83,84], [85,86]]]"))
df
> df
code polygon
1 12000 [[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]
2 89000 [[[81,82], [83,84], [85,86]]]
Input data description:
code contains postal codepolygon contains one or more polygons defined by their latitude-longitude pairs of pointsOutput wanted:
> wanted
a lon lat id
1 12000 11 12 1
2 12000 13 14 1
3 12000 15 16 1
4 12000 21 22 2
5 12000 23 24 2
6 12000 25 26 2
7 89000 81 82 1
8 89000 83 84 1
9 89000 85 86 1
I want to plot the wanted data.frame using ggplot.
purrr, dplyr and jsonlite solution:
df <- data.frame(code = c("12000", "89000"),
polygon = c("[[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]",
"[[[81,82], [83,84], [85,86]]]"),
stringsAsFactors=FALSE)
library(purrr)
library(dplyr)
library(jsonlite)
make_coords <- function(x) {
fromJSON(x$polygon, simplifyMatrix=FALSE) %>%
map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id")
}
group_by(df, a=code) %>%
do(make_coords(.)) %>%
ungroup() %>%
select(a, lat, lon, id)
## # A tibble: 9 x 4
## a lat lon id
## <chr> <int> <int> <chr>
## 1 12000 11 12 1
## 2 12000 13 14 1
## 3 12000 15 16 1
## 4 12000 21 22 2
## 5 12000 23 24 2
## 6 12000 25 26 2
## 7 89000 81 82 1
## 8 89000 83 84 1
## 9 89000 85 86 1
This has the added benefit of validating the polygon data since your example ha[ds] invalid JSON (I had to edit out the final ] in the initial example).
NOTES:
group_by could be replaced by dplyr::rowwise or (with some other code changes) by purrr::by_row
code, convert the JSON into a list of coordinates, iterate through that list and make a date frame out of each polygon, and assigning the positional ID to it.group_by (to turn code into a), the innermost map_df (for lat & lon) and finally id which is auto-created by the outermost map_df.rowwise version:
make_coords2 <- function(x) {
fromJSON(x$polygon, simplifyMatrix=FALSE) %>%
map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id") %>%
mutate(a=x$a)
}
select(df, a=code, polygon) %>%
rowwise() %>%
do(make_coords2(.)) %>%
ungroup() %>%
select(a, lat, lon, id)
by_row version:
make_coords3 <- function(x) {
fromJSON(x$polygon, simplifyMatrix=FALSE) %>%
map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id")
}
select(df, a=code, polygon) %>%
by_row(make_coords3, .collate="rows") %>%
select(a, lat, lon, id)
This isn't pretty, but some calls to strsplit, gsub, and unnest can do quite a bit:
]], allows us separate several polygons.id column can easily be done with row_number within each code
], to separate the point pairs.[ and ]., to separate lon and lat..
df %>%
mutate(polygon = strsplit(polygon, ']],')) %>%
unnest() %>%
group_by(code) %>%
mutate(id = row_number(),
polygon = strsplit(polygon, '],')) %>%
unnest() %>%
mutate(polygon = gsub(']|\\[', '', polygon),
polygon = strsplit(polygon, ','),
lon = sapply(polygon, '[', 1),
lat = sapply(polygon, '[', 2)) %>%
select(-polygon)
Source: local data frame [9 x 4] Groups: code [2] code id lon lat <chr> <int> <chr> <chr> 1 12000 1 11 12 2 12000 1 13 14 3 12000 1 15 16 4 12000 2 21 22 5 12000 2 23 24 6 12000 2 25 26 7 89000 1 81 82 8 89000 1 83 84 9 89000 1 85 86
I think there is a closing bracket too much in df$polygon[2]. If that is removed, you could do the following:
require(jsonlite)
require(reshape2)
parse_json <- function(polygon, code){
molten <- melt(fromJSON(polygon))
lat <- molten[which(molten$Var3==1), "value"]
lon <- molten[which(molten$Var3==2), "value"]
id <- molten[which(molten$Var3==1), "Var1"]
data.frame(code, lat, lon, id)
}
dat_raw <- mapply(parse_json, df$polygon, df$code, SIMPLIFY = FALSE, USE.NAMES = FALSE)
do.call(rbind, dat_raw)
Which gives you:
code lat lon id
1 12000 11 12 1
2 12000 21 22 2
3 12000 13 14 1
4 12000 23 24 2
5 12000 15 16 1
6 12000 25 26 2
7 89000 81 82 1
8 89000 83 84 1
9 89000 85 86 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With