If I run a randomForest(y ~ x, data = df)
model, x
a factor variable with more than 53 levels I get
Error in randomForest.default(m, y, ...) :
Can not handle categorical predictors with more than 53 categories.
If I change x
to as.character(x)
and re-run I get no errors.
What's the difference behind the scenes? Isn't both types treated as categorical variables?
The main difference is that factors have predefined levels. Thus their value can only be one of those levels or NA. Whereas characters can be anything.
In terms of general theory, random forests can work with both numeric and categorical data. The function randomForest (documentation here) supports categorical data coded as factors, so that would be your type.
One of the most important features of the Random Forest Algorithm is that it can handle the data set containing continuous variables as in the case of regression and categorical variables as in the case of classification. It performs better results for classification problems.
Yes, it can be used for both continuous and categorical target (dependent) variable. In random forest/decision tree, classification model refers to factor/categorical dependent variable and regression model refers to numeric or continuous dependent variable.
I guess each category's name is a numeric value (because randomForest()
can't treat character class
when it consist of character). randomForest()
treat character class
which consist of numeric value as numeric variables (i.e., numeric class
), NOT categorical variables (i.e., factor class
). If you change each category's name, the result will change.
Here is my example. If x_ is factor class
, the same results return. If x_ is integer class
or character class (but composed of numeric value)
, the outputs depend on the value. The result you got by as.character(x)
is CLEARY WRONG !!
set.seed(1); cw <- data.frame(y = subset(ChickWeight, Time==18)$weight, x1 = sample(47) )
cw$x2 <- as.factor(cw$x1)
cw$x3 <- as.character(cw$x1)
cw$x4 <- 47:1
cw$x5 <- as.factor(47:1)
cw$x6 <- as.character(47:1)
cw$x7 <- c(letters, LETTERS[1:21])
cw$x8 <- as.factor(cw$x7)
# %Var explained # class(x_)
set.seed(1); randomForest(y ~ x1, cw) # -29.61 integer1
set.seed(1); randomForest(y ~ x2, cw) # -0.42 factor
set.seed(1); randomForest(y ~ x3, cw) # -29.61 character (numeric name1)
set.seed(1); randomForest(y ~ x4, cw) # -31.78 integer2
set.seed(1); randomForest(y ~ x5, cw) # -0.42 factor
set.seed(1); randomForest(y ~ x6, cw) # -31.78 character (numeric name2)
set.seed(1); randomForest(y ~ x7, cw) # error character (letter name)
set.seed(1); randomForest(y ~ x8, cw) # -0.42 factor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With