Classification - Usage of factor levels

Tags:

I am currently working on a predictive model for a churn problem.
Whenever I try to run the following model, I get this error: At least one of the class levels is not a valid R variable name. This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1. Please use factor levels that can be used as valid R variable names.

fivestats <- function(...) c( twoClassSummary(...), defaultSummary(...))
fitControl.default    <- trainControl( 
    method  = "repeatedcv"
  , number  = 10
  , repeats = 1 
  , verboseIter = TRUE
  , summaryFunction  = fivestats
  , classProbs = TRUE
  , allowParallel = TRUE)
set.seed(1984)

rpartGrid             <-  expand.grid(cp = seq(from = 0, to = 0.1, by = 0.001))
rparttree.fit.roc <- train( 
    churn ~ .
  , data      = training.dt  
  , method    = "rpart"
  , trControl = fitControl.default
  , tuneGrid  = rpartGrid
  , metric = 'ROC'
  , maximize = TRUE
)

In the attached picture you see my data, I already transformed some data from chr to factor variable.

DATA OVERVIEW

I do not get what my problem is, if I would transform the entire data into factors, then for instance the variable total_airtime_out will probably have around 9000 factors.

Thanks for any kind of help!

456

asked May 20 '17 10:05

Simon

2 Answers

It's not exactly possible for me to reproduce your error, but my educated guess is that the error message tells you everything you need to know:

At least one of the class levels is not a valid R variable name. This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1. Please use factor levels that can be used as valid R variable names.

Emphasis mine. Looking at your response variable, its levels are "0" and "1", these aren't valid variable names in R (you can't do 0 <- "my value"). Presumably this problem will go away if you rename the levels of the response variable with something like

levels(training.dt$churn) <- c("first_class", "second_class")

as per this Q.

156

answered Oct 28 '22 13:10

einar

How about this base function:

 make.names(churn) ~ .,

to "make syntactically valid names out of character vectors"?

Source

answered Oct 28 '22 12:10

Dbercules

Related questions
                            
                                How to set the default language of date in R
                            
                                Error : "sh: gfortran: command not found" | Ubuntu 16.04
                            
                                Selecting specific elements from a matrix all at once
                            
                                Calculate frequency of occurrence in an array using R
                            
                                Can i host a shiny app on a windows machine?
                            
                                Align bars of histogram centered on labels
                            
                                Model matrix with all pairwise interactions between columns
                            
                                Select/Deselect All Button for shiny variable selection
                            
                                Can I add a "go to top" button to an HTML document rendered in R Markdown?
                            
                                How to put a complicated equation into a R formula?
                            
                                tidyr separate only first n instances [duplicate]
                            
                                ggplot2: Changing the layout of the legend
                            
                                How to create a pivot table in R with multiple (3+) variables
                            
                                Enriching a ggplot2 plot with multiple geom_segment in a loop?
                            
                                Error bars for barplot only in one direction
                            
                                Replace NA values by row means
                            
                                Select only rows if its value in a particular column is 'NA' in R
                            
                                How to sum over diagonals of data frame
                            
                                how to cumulatively add values in one vector in R
                            
                                Round vector of numerics to integer while preserving their sum

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Classification - Usage of factor levels

Tags:

r

classification

prediction

Simon

People also ask

2 Answers

einar

Dbercules

Recent Activity

Donate For Us