I need to recode a continuous variable into categories, usually i use the "cut" Function, but in the cut function i need to specify the breaks. i am looking for a way to have a different set of breaks depending on other categorical variables in my data frame.
the variable in my example is Cost and the "breaks" are in the second table "cost.range", i have a different set of Breaks for each "Region" and each "Category"
Example :
Region Product Category Cost
Country A Product 1 CAT A 731
Country B Product 1 CAT A 659
Country C Product 1 CAT A 385
Country D Product 1 CAT A 763
Country A Product 2 CAT A 701
Country B Product 2 CAT A 759
Country C Product 2 CAT A 580
Country D Product 2 CAT A 147
Country A Product 3 CAT B 645
Country B Product 3 CAT B 657
Country C Product 3 CAT B 424
Region Category Cost.Range Range
Country A CAT A 10 R1
Country A CAT A 50 R2
Country A CAT A 200 R3
Country A CAT A 1000 R4
Country A CAT B 20 R1
Country A CAT B 100 R2
Country A CAT B 400 R3
Country A CAT B 1500 R4
code to generate the example :
Region <- c("Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D")
Product <- c("Product 1","Product 1","Product 1","Product 1","Product 2","Product 2","Product 2","Product 2","Product 3","Product 3","Product 3","Product 3","Product 4","Product 4","Product 4","Product 4")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B")
Cost <- c(731,659,385,763,701,759,580,147,645,657,424,34,850,463,160,550)
Table1 <- data.frame(Region, Product, Category, Cost)
Region <- c("Country A","Country A","Country A","Country A","Country A","Country A","Country A","Country A")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B")
Cost.range <- c(10,50,200,1000,20,100,400,1500)
Range <- c("R1","R1","R3","R4","R1","R2","R3","R4")
Table2 <- data.frame(Region, Category, Cost.range, Range)
This is not the most elegant solution (and I'd be interested to see a better method) but it should achieve the result you're looking for.
The select() and distinct() functions from the dplyr package find the possible combinations of Region and Category. These combinations are used to subset the two tables and apply the cut() function to each subset.
library('dplyr')
library('data.table')
dt1 <- data.table(Table1)
dt2 <- data.table(Table2)
t2d <- Table2 %>% select(Region, Category) %>% distinct
for(i in 1:nrow(t2d)){
dt2_range_subset <- dt2[Region == as.character(t2d$Region[i])
& Category == t2d$Category[i], Cost.range]
dt1[Region == as.character(t2d$Region[i]) & Category == t2d$Category[i],
Cost_factor := cut(as.matrix(Cost), dt2_range_subset)]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With