Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match values of several variables to a variable in a look up table?

I have two datasets:

loc <- c("a","b","c","d","e")
id1 <- c(NA,9,3,4,5)
id2 <- c(2,3,7,5,6)
id3 <- c(2,NA,5,NA,7)
cost1 <- c(10,20,30,40,50)
cost2 <- c(50,20,30,30,50)
cost3 <- c(40,20,30,10,20)
dt <- data.frame(loc,id1,id2,id3,cost1,cost2,cost3)


id <- c(1,2,3,4,5,6,7)
rate <- c(0.9,0.8,0.7,0.6,0.5,0.4,0.3)
lookupd_tb <- data.frame(id,rate)

what I want to do, is to match the values in dt with lookup_tb for id1,id2 and id3 and if there is a match, multiply rate for that id to its related cost.

This is my approach:

dt <- dt %>% 
left_join(lookupd_tb , by=c("id1"="id")) %>%
dplyr :: mutate(cost1 = ifelse(!is.na(rate), cost1*rate, cost1)) %>% 
dplyr :: select (-rate)

what I am doing now, works fine but I have to repeat it 3 times for each variable and I was wondering if there is a more efficient way to do this(probably using apply family?)

I tried to join all three variables with id in my look up table but when rate is joined with my dt, all the costs (cost1, cost2 and cost3) will be multiply by the same rate which I don't want.

I appreciate your help!

like image 690
Rio Avatar asked May 15 '18 13:05

Rio


1 Answers

A base R approach would be to loop through the columns of 'id' using sapply/lapply, get the matching index from the 'id' column of 'lookupd_tb', based on the index, get the corresponding 'rate', replace the NA elements with 1, multiply with 'cost' columns and update the 'cost' columns

nmid <- grep("id", names(dt))
nmcost <- grep("cost", names(dt))

dt[nmcost] <- dt[nmcost]*sapply(dt[nmid], function(x) {
         x1 <- lookupd_tb$rate[match(x, lookupd_tb$id)]
          replace(x1, is.na(x1), 1) })

Or using tidyverse, we can loop through the sets of columns i.e. 'id' and 'cost' with purrr::map2, then do the same approach as above. The only diference is that here we created new columns instead of updating the 'cost' columns

library(tidyverse)
dt %>% 
   select(nmid) %>% 
   map2_df(., dt %>% 
               select(nmcost), ~  
                 .x %>% 
                     match(., lookupd_tb$id) %>%
                     lookupd_tb$rate[.] %>% 
                     replace(., is.na(.),1) * .y ) %>%
    rename_all(~ paste0("costnew", seq_along(.))) %>%
    bind_cols(dt, .)
like image 139
akrun Avatar answered Sep 28 '22 02:09

akrun