Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to recode a continuous variable into Ranges

Tags:

r

I need to recode a continuous variable into categories, usually i use the "cut" Function, but in the cut function i need to specify the breaks. i am looking for a way to have a different set of breaks depending on other categorical variables in my data frame.

the variable in my example is Cost and the "breaks" are in the second table "cost.range", i have a different set of Breaks for each "Region" and each "Category"

Example :

Region    Product     Category Cost
Country A Product 1     CAT A  731
Country B Product 1     CAT A  659
Country C Product 1     CAT A  385
Country D Product 1     CAT A  763
Country A Product 2     CAT A  701
Country B Product 2     CAT A  759
Country C Product 2     CAT A  580
Country D Product 2     CAT A  147
Country A Product 3     CAT B  645
Country B Product 3     CAT B  657
Country C Product 3     CAT B  424


Region       Category  Cost.Range      Range
Country A      CAT A         10          R1
Country A      CAT A         50          R2
Country A      CAT A        200          R3
Country A      CAT A       1000          R4
Country A      CAT B         20          R1
Country A      CAT B        100          R2
Country A      CAT B        400          R3
Country A      CAT B       1500          R4

code to generate the example :

Region <- c("Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D")
Product <- c("Product 1","Product 1","Product 1","Product 1","Product 2","Product 2","Product 2","Product 2","Product 3","Product 3","Product 3","Product 3","Product 4","Product 4","Product 4","Product 4")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B")
Cost <- c(731,659,385,763,701,759,580,147,645,657,424,34,850,463,160,550)

Table1 <- data.frame(Region, Product, Category, Cost)

Region <- c("Country A","Country A","Country A","Country A","Country A","Country A","Country A","Country A")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B")
Cost.range <- c(10,50,200,1000,20,100,400,1500)
Range <- c("R1","R1","R3","R4","R1","R2","R3","R4")

Table2 <- data.frame(Region, Category, Cost.range, Range)
like image 321
akruug Avatar asked Nov 26 '25 22:11

akruug


1 Answers

This is not the most elegant solution (and I'd be interested to see a better method) but it should achieve the result you're looking for.

The select() and distinct() functions from the dplyr package find the possible combinations of Region and Category. These combinations are used to subset the two tables and apply the cut() function to each subset.

library('dplyr')
library('data.table')

dt1 <- data.table(Table1)
dt2 <- data.table(Table2)

t2d <- Table2 %>% select(Region, Category) %>% distinct

for(i in 1:nrow(t2d)){
  dt2_range_subset <- dt2[Region == as.character(t2d$Region[i]) 
                          & Category == t2d$Category[i], Cost.range]
  dt1[Region == as.character(t2d$Region[i]) & Category == t2d$Category[i],
      Cost_factor := cut(as.matrix(Cost), dt2_range_subset)]
}
like image 90
NGaffney Avatar answered Nov 29 '25 14:11

NGaffney



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!