I want to transform a dataframe into another dataframe. If possible, in less commands, using dplyr
or tidyr
would be great.
In order to parse the coordinates list I used library(rjson)
, this part is OK, but I cannot manipulate the list further to get my result.
Should you can avoid using any for
statement would be great, but any solution is good as long as it solve the problem :)
Input:
df <- data.frame(code = c("12000", "89000"),
polygon = c("[[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]",
"[[[81,82], [83,84], [85,86]]]"))
df
> df
code polygon
1 12000 [[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]
2 89000 [[[81,82], [83,84], [85,86]]]
Input data description:
code
contains postal codepolygon
contains one or more polygons defined by their latitude-longitude pairs of pointsOutput wanted:
> wanted
a lon lat id
1 12000 11 12 1
2 12000 13 14 1
3 12000 15 16 1
4 12000 21 22 2
5 12000 23 24 2
6 12000 25 26 2
7 89000 81 82 1
8 89000 83 84 1
9 89000 85 86 1
I want to plot the wanted data.frame using ggplot.
purrr
, dplyr
and jsonlite
solution:
df <- data.frame(code = c("12000", "89000"),
polygon = c("[[[11,12], [13,14], [15,16]], [[21, 22], [23,24], [25,26]]]",
"[[[81,82], [83,84], [85,86]]]"),
stringsAsFactors=FALSE)
library(purrr)
library(dplyr)
library(jsonlite)
make_coords <- function(x) {
fromJSON(x$polygon, simplifyMatrix=FALSE) %>%
map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id")
}
group_by(df, a=code) %>%
do(make_coords(.)) %>%
ungroup() %>%
select(a, lat, lon, id)
## # A tibble: 9 x 4
## a lat lon id
## <chr> <int> <int> <chr>
## 1 12000 11 12 1
## 2 12000 13 14 1
## 3 12000 15 16 1
## 4 12000 21 22 2
## 5 12000 23 24 2
## 6 12000 25 26 2
## 7 89000 81 82 1
## 8 89000 83 84 1
## 9 89000 85 86 1
This has the added benefit of validating the polygon data since your example ha[ds] invalid JSON (I had to edit out the final ]
in the initial example).
NOTES:
group_by
could be replaced by dplyr::rowwise
or (with some other code changes) by purrr::by_row
code
, convert the JSON into a list of coordinates, iterate through that list and make a date frame out of each polygon, and assigning the positional ID to it.group_by
(to turn code
into a
), the innermost map_df
(for lat
& lon
) and finally id
which is auto-created by the outermost map_df
.rowwise
version:
make_coords2 <- function(x) {
fromJSON(x$polygon, simplifyMatrix=FALSE) %>%
map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id") %>%
mutate(a=x$a)
}
select(df, a=code, polygon) %>%
rowwise() %>%
do(make_coords2(.)) %>%
ungroup() %>%
select(a, lat, lon, id)
by_row
version:
make_coords3 <- function(x) {
fromJSON(x$polygon, simplifyMatrix=FALSE) %>%
map_df(~map_df(., ~setNames(as.data.frame(as.list(.)), c("lat", "lon"))), .id="id")
}
select(df, a=code, polygon) %>%
by_row(make_coords3, .collate="rows") %>%
select(a, lat, lon, id)
This isn't pretty, but some calls to strsplit
, gsub
, and unnest
can do quite a bit:
]],
allows us separate several polygons.id
column can easily be done with row_number
within each code
],
to separate the point pairs.[
and ]
.,
to separate lon
and lat
..
df %>%
mutate(polygon = strsplit(polygon, ']],')) %>%
unnest() %>%
group_by(code) %>%
mutate(id = row_number(),
polygon = strsplit(polygon, '],')) %>%
unnest() %>%
mutate(polygon = gsub(']|\\[', '', polygon),
polygon = strsplit(polygon, ','),
lon = sapply(polygon, '[', 1),
lat = sapply(polygon, '[', 2)) %>%
select(-polygon)
Source: local data frame [9 x 4] Groups: code [2] code id lon lat <chr> <int> <chr> <chr> 1 12000 1 11 12 2 12000 1 13 14 3 12000 1 15 16 4 12000 2 21 22 5 12000 2 23 24 6 12000 2 25 26 7 89000 1 81 82 8 89000 1 83 84 9 89000 1 85 86
I think there is a closing bracket too much in df$polygon[2]. If that is removed, you could do the following:
require(jsonlite)
require(reshape2)
parse_json <- function(polygon, code){
molten <- melt(fromJSON(polygon))
lat <- molten[which(molten$Var3==1), "value"]
lon <- molten[which(molten$Var3==2), "value"]
id <- molten[which(molten$Var3==1), "Var1"]
data.frame(code, lat, lon, id)
}
dat_raw <- mapply(parse_json, df$polygon, df$code, SIMPLIFY = FALSE, USE.NAMES = FALSE)
do.call(rbind, dat_raw)
Which gives you:
code lat lon id
1 12000 11 12 1
2 12000 21 22 2
3 12000 13 14 1
4 12000 23 24 2
5 12000 15 16 1
6 12000 25 26 2
7 89000 81 82 1
8 89000 83 84 1
9 89000 85 86 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With