Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine multiple categorical variables in one dummy variable

Tags:

r

I have 3 categorical variables

agegroup{<20,20-30,>03}    
disease.level{0,1,2},  
performance{<60, >=60}

and I would like to combine them into one dummy variable with 3x3x2 levels. Is there any fast way to do this? My original datasets has about 10 variables with multiple levels in each.

Basically I am asking for the exact opposite of this question Create new dummy variable columns from categorical variable

Thanks a lot EC

like image 623
ECII Avatar asked Dec 07 '11 17:12

ECII


1 Answers

I'm not sure whether by "dummy variable" you want 0/1 indicator variables (in which you would have 18 dummy variables) or whether you want a single factor with 18 levels. Sounds like the latter. (Actually, paste would work as well as interaction, although interaction is a bit more self-describing.)

> ff <- expand.grid(agegroup=factor(c("<20","20-30",">30")),
       disease.level=factor(0:2),performance=factor(c("<60",">=60")))
> combfac <- with(ff,interaction(agegroup,disease.level,performance))
> combfac
 [1] <20.0.<60    20-30.0.<60  >30.0.<60    <20.1.<60    20-30.1.<60 
 [6] >30.1.<60    <20.2.<60    20-30.2.<60  >30.2.<60    <20.0.>=60  
[11] 20-30.0.>=60 >30.0.>=60   <20.1.>=60   20-30.1.>=60 >30.1.>=60  
[16] <20.2.>=60   20-30.2.>=60 >30.2.>=60  
18 Levels: <20.0.<60 20-30.0.<60 >30.0.<60 <20.1.<60 20-30.1.<60 ... >30.2.>=60

If you want to use all the variables in the data frame to create the interaction you can use do.call(interaction,ff).

If you did want the dummy variables you would do model.matrix(~combfac-1) to get them.

like image 129
Ben Bolker Avatar answered Oct 06 '22 00:10

Ben Bolker