Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

building classification tree having categorical variables using rpart

I have a data set with 14 features and few of them are as below, where sex and marital status are categorical variables.

height,sex,maritalStatus,age,edu,homeType

SEX
         1. Male
         2. Female

MARITAL STATUS
         1. Married
         2. Living together, not married
         3. Divorced or separated
         4. Widowed
         5. Single, never married

Now I am using rpart library from R to build a classification tree using the following

rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)

This gives me a decision tree that does not consider sex and marital status as factors.

I am thinking of using as.factor for this :

sex = as.factor(trainingData$sex)
ms = as.factor(trainingData$maritalStatus)

But I am not sure how do i pass this information to rpart. Since the data argument in rpart() takes in "trainingData" data frame. It will always take the values that are in this data frame. I am little new to R and would appreciate someone's help on this.

like image 628
user4251309 Avatar asked Nov 14 '14 07:11

user4251309


1 Answers

You could make the changes to the trainingData data frame directly, then run rpart().

trainingData$sex = as.factor(trainingData$sex)
trainingData$maritalStatus = as.factor(trainingData$maritalStatus)
rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)
like image 79
Jean V. Adams Avatar answered Oct 04 '22 12:10

Jean V. Adams