Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - merging two tables and calculating transfers between variable

Tags:

r

I have two dataframes in R

What I am trying to do is for every store in the Sales df, if the product is in the "Product" column in the Transfer df then the sales of the product would be transferred to the Product in "Replacement_Product" column.

Transfer rule: If both transfer to item are in the store then 90% of the sales of the "from product" is distributed equally to the two "replacement_product". If the store only has one of the "replacement_product" there then 80% of the "from product" sales is transferred.

Where I am getting stuck is that it is not always a one-to-one transfer. Using Store 1 as an example, product A transfers to both B and C, hence 90% of A's 200 sale will be transferred equally to B and C, i.e. 200*0.9=180, so 90 gets added to B and C However, item X transfers to Y an Z in the transfer table, but store 1 only has items X and Y, in this case 80% of X will be added to Y.

If it was a one-to-one I could join itself and then do the calculations, but having to check if the store has 0, 1, or 2 transfer item and then the transfer % depends on how many transfer item has me stuck on how to start.

Sales df:

+----------+-------+-------+
| Product  | Store | Sales |
+----------+-------+-------+
| A        |     1 |   200 |
| B        |     1 |   100 |
| C        |     1 |   200 |
| X        |     1 |   400 |
| Y        |     1 |   350 |
| A        |     2 |  1000 |
| B        |     2 |  1000 |
| C        |     2 |   600 |
| X        |     2 |   700 |
| Y        |     2 |   800 |
| Z        |     2 |   400 |
| A        |     3 |  1000 |
| X        |     3 |   500 |
| Z        |     3 |   400 |
+----------+-------+-------+

Transfer df:
A list of the product and it's replacement products

+---------+---------------------+
| Product | Replacement_Product |
+---------+---------------------+
| A       | B                   |
| A       | C                   |
| X       | Y                   |
| X       | Z                   |
+---------+---------------------+

Output table:

+----------+-------+-------+
| Product  | Store | Sales |
+----------+-------+-------+
| B        |     1 |   190 |
| C        |     1 |   290 |
| Y        |     1 |   670 |
| B        |     2 |  1450 |
| C        |     2 |  1050 |
| Y        |     2 |  1115 |
| Z        |     2 |   715 |
| Z        |     3 |   800 |
+----------+-------+-------+

What I have tried:

test <- sqldf("select a.*, b.Replacement_Product from Sales a 
left join Transfer b on a.Product = b.Product
where a.Product in ('A','B')")

Once I have a table that has the Product and their transfers I was going to join it again on Replacement_Product and Store to get the Sales, then based on if it is 0 I can do a if statement to calculate new sales, and then I remove A, B etc. However as I do this it does not seem scalable if the table is large and i can't specify the Product getting removed.

Here is the Sales df:

structure(list(Product = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 6L, 1L, 4L, 6L), .Label = c("A", "B", "C", "X", 
"Y", "Z"), class = "factor"), Store = c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), Sales = c(200L, 100L, 200L, 
400L, 350L, 1000L, 1000L, 600L, 700L, 800L, 400L, 1000L, 500L, 
400L)), .Names = c("Product", "Store", "Sales"), class = "data.frame", row.names = c(NA, 
-14L))

Transfer df:

structure(list(Product = structure(c(1L, 1L, 2L, 2L), .Label = c("A", 
"X"), class = "factor"), Replacement_Product = structure(1:4, .Label = c("B", 
"C", "Y", "Z"), class = "factor")), .Names = c("Product", "Replacement_Product"
), class = "data.frame", row.names = c(NA, -4L))
like image 351
Lulumocha Avatar asked Oct 16 '22 11:10

Lulumocha


2 Answers

There's probably a more concise way, but here's a start using dplyr.

library(dplyr)
sales %>%
  inner_join(transfer) %>%
  inner_join(sales %>% select(Replacement_Product = Product, Store)) %>%
  add_count(Product, Store) %>%
  mutate(Sales_trans = if_else(n == 2, Sales * 0.9 / 2, Sales * 0.8)) %>%
  left_join(sales, by = c("Replacement_Product" = "Product", "Store")) %>%
  mutate(total = Sales_trans + Sales.y) %>%
  select(Product = Replacement_Product, Store, total)

  Product Store total
  <chr>   <int> <dbl>
1 B           1   190
2 C           1   290
3 Y           1   670
4 B           2  1450
5 C           2  1050
6 Y           2  1115
7 Z           2   715
8 Z           3   800
like image 117
Jon Spring Avatar answered Nov 15 '22 13:11

Jon Spring


Here's a dense data.table way:

library(data.table)
setDT(sales)
setDT(transfer)

merged <- transfer[sales, on = .(Product), allow.cartesian = T]

merged[merged
         , on = .(Product = Replacement_Product, Store)
         , .(x.Product, i.Product, Store, Sales, i.Sales)
         , nomatch = 0L
         ][, Total_Sales := Sales + ifelse(.N == 2, 0.9 * 0.5 * i.Sales, 0.8 * i.Sales)
           , by = .(i.Product, Store)
           ][, .(Product = x.Product, Store, Total_Sales)]

   Product Store Total_Sales
1:       B     1         190
2:       C     1         290
3:       Y     1         670
4:       B     2        1450
5:       C     2        1050
6:       Y     2        1115
7:       Z     2         715
8:       Z     3         800
like image 36
Cole Avatar answered Nov 15 '22 13:11

Cole