Error coming while using Random Forest using R

Tags:

r

I am using a dataset containing mvar_1 as column, having names of one of 5 parties that citizen voted for last year. Other variables are just demographic variables, as the number of rallies attended for each parties, other stuffs.

When I use the following code:

data.model.rf = randomForest(mvar_1 ~ mvar_2 + mvar_3 + mvar_4 + mvar_5 + 
                             mvar_6 + mvar_7 + mvar_8 + mvar_9 + mvar_10 + 
                             mvar_11 + mvar_15 + mvar_17 + mvar_18 + mvar_21 + 
                             mvar_22 + mvar_23 + mvar_24 + mvar_25 + mvar_26 +
                             mvar_28, data=data.train, ntree=20000, mtry=15, 
                             importance=TRUE, na.action = na.omit )

This error message appears:

Error in randomForest.default(m, y, ...) : 
  Can not handle categorical predictors with more than 53 categories.

611

asked Oct 13 '15 09:10

akhil verma

1 Answers

One of your mvar is a factor with more than 53 levels.

You may have a categorical variable with lots of levels, like demographic group, and you should aggregate it into less levels to use this package. (See here for the best way of doing it)

More likely, you have a non-categorical variable incorrectly typed as a factor. In this case you should fix it by typing your variable correctly. E.g. to get a numeric from a factor, you call as.numeric(as.character(myfactor)).

If you don't know what a factor is, the second option is probably it. You should do a summary of data.train, this will help you see which mvar are incorrectly typed. If the mvar is typed as numeric, you will see min, max, mean, median, etc. If a numeric variable is incorrectly typed as a factor, you will not see that but you will see the number of occurence of each level.

In any case, calling summary will help you because it shows the number of levels for each factor. The variables with >53 levels are causing the issue.

126

answered Sep 22 '22 11:09

asachet

Related questions
                            
                                Visualize ANCOVA incl formulas (e.g. library HH)
                            
                                ifelse with multiple condition for creating new variable in data.table R [duplicate]
                            
                                Running Callgrind on simple R file
                            
                                geom_point points manual scaling
                            
                                waiting for user input in R from terminal
                            
                                Remove variable from RHS of a formula that has a dot
                            
                                how to define your own distribution for fitdistr function in R with the help of lmomco function
                            
                                Starting Y axis at 0 using ggplot and facet_wrap [duplicate]
                            
                                Simple function counting values from a list within certain range
                            
                                r google search result count retrieve [closed]
                            
                                Add text and line to an `image()` in graphics
                            
                                R: Extract unique values in columns of a dataframe
                            
                                get line number with bash in R
                            
                                Is there a function to split a large dataframe into n smaller dataframes of equal size (by row) and have an n+1 dataframe of smaller size?
                            
                                Simulated Annealing in R: GenSA running time
                            
                                Creating an RPackage - UseMethod can't find function
                            
                                merge and replace values in two data.tables
                            
                                ggplot2 - Graph with line and dots for two data sets legend issues
                            
                                Why am I getting the error "invalid type closure"?
                            
                                Shiny/R error: Paths should be to files within the project directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With