I have this sample data:
df <- tibble(
"City1" = c("New York", "Boston", "Chicago"),
"City2" = c("Chicago", "Cleveland", "Atlanta"))
Assume City1
is the starting point, and City2
is the destination. I.e., a person traveled from New York to Chicago.
I want to add a column for the starting latitude and a column for starting longitude, and also do the same for the destination city. In all, I want four new columns. I already have the coordinates.
How can I assign the coordinates? I have tried using case_when
, but I am not sure how to deliver coordinates to multiple columns. It is easy to do one column:
library(tidyverse)
# The numbers after the cities are the latitudes
df <- df %>%
mutate(
City1_lat = case_when(
City1 == 'New York' ~ 40.7128,
City1 == 'Boston' ~ 42.3601,
City1 == 'Chicago' ~ 41.8781
)
)
How can I expand this to add in a City1_lon
column? Trying to streamline this as much as possible, since I have several thousand rows of origins/destinations. Either a dplyr
or base
solution works. I would expand this for the destination cities, City2
. For reference:
New York: 40.7128, 74.0060
Boston: 42.3601, 71.0589
Chicago: 41.8781, 87.6298
Cleveland: 41.4993, 81.6944
Atlanta: 33.7490, 84.3880
With your city data in a dataframe like this:
> city
City lat long
1 New York 40.7128 74.0060
2 Boston 42.3601 71.0589
3 Chicago 41.8781 87.6298
4 Cleveland 41.4993 81.6944
5 Atlanta 33.7490 84.3880
Use match
to lookup city names across the tables, extract the lat-long, and renaming gives this:
> setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long"))
City1lat City1long
1 40.7128 74.0060
2 42.3601 71.0589
3 41.8781 87.6298
> setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long"))
City2lat City2long
3 41.8781 87.6298
4 41.4993 81.6944
5 33.7490 84.3880
which you can cbind
onto your original data:
> df = cbind(df, setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long")), setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long")))
> df
City1 City2 City1lat City1long City2lat City2long
1 New York Chicago 40.7128 74.0060 41.8781 87.6298
2 Boston Cleveland 42.3601 71.0589 41.4993 81.6944
3 Chicago Atlanta 41.8781 87.6298 33.7490 84.3880
One option is to do a left_join
after creating a 'keyval' dataset
library(tidyverse)
map_dfc(names(df), ~ df %>%
select(.x) %>%
left_join(keyval, by = setNames('City', .x))) %>%
select(names(df), everything())
# A tibble: 3 x 6
# City1 City2 lat lon lat1 lon1
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 New York Chicago 40.7 74.0 41.9 87.6
#2 Boston Cleveland 42.4 71.1 41.5 81.7
#3 Chicago Atlanta 41.9 87.6 33.7 84.4
If there are more columns in the original data and we are only interested in the 'City' columns, then only loop through the 'City' columns
df$journeys <- (100,200,300)
nm1 <- grep("City", names(df), value = TRUE)
map_dfc(nm1, ~ df %>%
select(.x) %>%
left_join(keyval, by = setNames('City', .x))) %>%
bind_cols(df %>%
select(-one_of(nm1)))
keyval <- structure(list(City = c("New York", "Boston", "Chicago", "Cleveland",
"Atlanta"), lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.749
), lon = c(74.0068, 71.0589, 87.6298, 81.6944, 84.388)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
Here's a tidyverse solution:
library(dplyr)
library(purrr)
df <- tibble(
"City1" = c("New York", "Boston", "Chicago"),
"City2" = c("Chicago", "Cleveland", "Atlanta"))
df <- df %>%
mutate(
City1_coords = case_when(
City1 == 'New York' ~ list(c(40.7128,74.0060)),
City1 == 'Boston' ~ list(c(42.3601,71.0589)),
City1 == 'Chicago' ~ list(c(41.8781,87.6298))
)
) %>%
mutate(City1_lat = City1_coords %>% map_dbl(~ .x[1] ),
City1_lon = City1_coords %>% map_dbl(~ .x[2] ))
Here is a way to do it using mutate_all
and unnest
, with a bonus hack for naming the columns :
df %>%
mutate_all(funs(l = case_when(
. == 'New York' ~ list(tibble(at=40.7128, on=74.0060)),
. == 'Boston' ~ list(tibble(at=42.3601, on=71.0589)),
. == 'Chicago' ~ list(tibble(at=41.8781, on=87.6298)),
. == 'Cleveland' ~ list(tibble(at=41.4993, on=81.6944)),
. == 'Atlanta' ~ list(tibble(at=33.7490, on=84.3880))
)
)) %>%
unnest(.sep = "")
# # A tibble: 3 x 6
# City1 City2 City1_lat City1_lon City2_lat City2_lon
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 New York Chicago 40.7128 74.0060 41.8781 87.6298
# 2 Boston Cleveland 42.3601 71.0589 41.4993 81.6944
# 3 Chicago Atlanta 41.8781 87.6298 33.7490 84.3880
This addresses "Using case_when() to assign two new columns".
To solve the general problem I would recommend a solution based on left joins, as it is more flexible to have your keys and values in a neat separate table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With