Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to collapse a categorical variable into less elements in R

Tags:

r

Suppose I have a categorical variable such like:

set.seed(123)
x<-sample(c("I", "IA", "IB", "II", "IIB", "IIC", "III", "IIID", "IIIF", "XA", "XB", "XC"), 
    100, TRUE)
table(x, exclude=NULL)

#    x
#   I   IA   IB   II  IIB  IIC  III IIID IIIF   XA   XB   XC <NA> 
#   5   12    9    7    9   11    6    8    6   12    9    6    0 

My question is how to easily collapse x into four elements, e.g. I, II, III and X? E.g. combining I, IA, IB into I etc.

like image 542
David Z Avatar asked Dec 05 '25 09:12

David Z


1 Answers

More generally, if your categorical variables aren't grouped by such patterns, you can specify a mapping using case_when from dplyr:

y <- case_when(x %in% c("I", "IA", "IB") ~ "I", #or whatever conditions you want
               x %in% c("II", "IIA", "IIB") ~ "II", #as above
               TRUE ~ "III")
table(y)

  I  II III 
 33  24  43 
like image 146
Z.Lin Avatar answered Dec 08 '25 00:12

Z.Lin