Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using case_when() to assign two new columns, instead of one

Tags:

r

dplyr

I have this sample data:

df <- tibble(
  "City1" = c("New York", "Boston", "Chicago"),
  "City2" = c("Chicago", "Cleveland", "Atlanta"))

Assume City1 is the starting point, and City2 is the destination. I.e., a person traveled from New York to Chicago.

I want to add a column for the starting latitude and a column for starting longitude, and also do the same for the destination city. In all, I want four new columns. I already have the coordinates.

How can I assign the coordinates? I have tried using case_when, but I am not sure how to deliver coordinates to multiple columns. It is easy to do one column:

library(tidyverse)

# The numbers after the cities are the latitudes
df <- df %>% 
  mutate(
   City1_lat = case_when(
    City1 == 'New York' ~ 40.7128,
    City1 == 'Boston' ~ 42.3601,
    City1 == 'Chicago' ~ 41.8781
  )
 )

How can I expand this to add in a City1_lon column? Trying to streamline this as much as possible, since I have several thousand rows of origins/destinations. Either a dplyr or base solution works. I would expand this for the destination cities, City2. For reference:

New York: 40.7128, 74.0060
Boston: 42.3601, 71.0589
Chicago: 41.8781, 87.6298
Cleveland: 41.4993, 81.6944
Atlanta: 33.7490, 84.3880
like image 842
papelr Avatar asked Jan 28 '19 17:01

papelr


4 Answers

With your city data in a dataframe like this:

> city
       City     lat    long
1  New York 40.7128 74.0060
2    Boston 42.3601 71.0589
3   Chicago 41.8781 87.6298
4 Cleveland 41.4993 81.6944
5   Atlanta 33.7490 84.3880

Use match to lookup city names across the tables, extract the lat-long, and renaming gives this:

> setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long"))
  City1lat City1long
1  40.7128   74.0060
2  42.3601   71.0589
3  41.8781   87.6298

> setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long"))
  City2lat City2long
3  41.8781   87.6298
4  41.4993   81.6944
5  33.7490   84.3880

which you can cbind onto your original data:

> df = cbind(df, setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long")), setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long")))
> df
     City1     City2 City1lat City1long City2lat City2long
1 New York   Chicago  40.7128   74.0060  41.8781   87.6298
2   Boston Cleveland  42.3601   71.0589  41.4993   81.6944
3  Chicago   Atlanta  41.8781   87.6298  33.7490   84.3880
like image 149
Spacedman Avatar answered Nov 19 '22 01:11

Spacedman


One option is to do a left_join after creating a 'keyval' dataset

library(tidyverse)
map_dfc(names(df), ~  df %>% 
                        select(.x) %>% 
                        left_join(keyval, by = setNames('City', .x))) %>%
    select(names(df), everything())  
# A tibble: 3 x 6
#  City1    City2       lat   lon  lat1  lon1
#  <chr>    <chr>     <dbl> <dbl> <dbl> <dbl>
#1 New York Chicago    40.7  74.0  41.9  87.6
#2 Boston   Cleveland  42.4  71.1  41.5  81.7
#3 Chicago  Atlanta    41.9  87.6  33.7  84.4

If there are more columns in the original data and we are only interested in the 'City' columns, then only loop through the 'City' columns

df$journeys <- (100,200,300)
nm1 <- grep("City", names(df), value = TRUE)
map_dfc(nm1, ~  df %>% 
                     select(.x) %>% 
                     left_join(keyval, by = setNames('City', .x))) %>%  
      bind_cols(df %>% 
                  select(-one_of(nm1)))

data

keyval <- structure(list(City = c("New York", "Boston", "Chicago", "Cleveland", 
 "Atlanta"), lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.749
 ), lon = c(74.0068, 71.0589, 87.6298, 81.6944, 84.388)), row.names = c(NA, 
  -5L), class = c("tbl_df", "tbl", "data.frame"))
like image 40
akrun Avatar answered Nov 19 '22 02:11

akrun


Here's a tidyverse solution:

library(dplyr)
library(purrr)

df <- tibble(
  "City1" = c("New York", "Boston", "Chicago"),
  "City2" = c("Chicago", "Cleveland", "Atlanta"))


df <- df %>% 
  mutate(
    City1_coords = case_when(
      City1 == 'New York' ~ list(c(40.7128,74.0060)),
      City1 == 'Boston' ~ list(c(42.3601,71.0589)),
      City1 == 'Chicago' ~ list(c(41.8781,87.6298))
    )
  ) %>% 
  mutate(City1_lat = City1_coords %>% map_dbl(~ .x[1] ),
         City1_lon = City1_coords %>% map_dbl(~ .x[2] ))
like image 2
thc Avatar answered Nov 19 '22 02:11

thc


Here is a way to do it using mutate_all and unnest, with a bonus hack for naming the columns :

df %>% 
  mutate_all(funs(l = case_when(
      . == 'New York'  ~ list(tibble(at=40.7128, on=74.0060)),
      . == 'Boston'    ~ list(tibble(at=42.3601, on=71.0589)),
      . == 'Chicago'   ~ list(tibble(at=41.8781, on=87.6298)),
      . == 'Cleveland' ~ list(tibble(at=41.4993, on=81.6944)),
      . == 'Atlanta'   ~ list(tibble(at=33.7490, on=84.3880))
    )
  )) %>%
  unnest(.sep = "")

# # A tibble: 3 x 6
#      City1     City2 City1_lat City1_lon City2_lat City2_lon
#      <chr>     <chr>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 New York   Chicago   40.7128   74.0060   41.8781   87.6298
# 2   Boston Cleveland   42.3601   71.0589   41.4993   81.6944
# 3  Chicago   Atlanta   41.8781   87.6298   33.7490   84.3880

This addresses "Using case_when() to assign two new columns".

To solve the general problem I would recommend a solution based on left joins, as it is more flexible to have your keys and values in a neat separate table.

like image 2
Moody_Mudskipper Avatar answered Nov 19 '22 03:11

Moody_Mudskipper