Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C5.0 models require a factor outcome

Tags:

r

I am working with the credit.csv for building a learning tree, the data is available in:

https://github.com/stedy/Machine-Learning-with-R-datasets/blob/master/credit.csv

and I have made the following steps:

credit<-read.csv("credit.csv")
set.seed(12345)
credit_rand<-credit[order(runif(1000)),]
credit_train<-credit_rand[1:900,]
credit_test<-credit_rand[901:1000,]
library(C50)
credit_model<-C5.0(credit_train[-21],credit_train$default)

In the guide that I am following it appears that I should get rid of the last column which is the value of default, but I got the following error:

Error en C5.0.default(credit_train[, -21], credit_train$default) : 
  C5.0 models require a factor outcome

I have tried changing the last line to:

credit_model<-C5.0(credit_train[,-21],credit_train$default)

but with no success at all.

Any help?

like image 928
Little Avatar asked Jun 24 '15 14:06

Little


1 Answers

Your problem is that C5.0 models require a factor outcome. You have given the outcome as credit_train$default, which is a 1/2 outcome, but R has read it as numeric, rather than a factor:

str(credit_train$default)
int [1:900] 2 1 1 1 2 1 2 2 1 1 ...

The solution then is to convert it to a factor:

credit_train$default<-as.factor(credit_train$default)
str(credit_train$default)

Factor w/ 2 levels "1","2": 2 1 1 1 2 1 2 2 1 1 ...

And then run your training:

 credit_model<-C5.0(credit_train[-21],credit_train$default)
like image 152
jeremycg Avatar answered Nov 03 '22 06:11

jeremycg