I have a matrix which is about 37k x 1024 consisting of 1s and 0s as categorical variables to indicate the existence or absence of a feature vector. I ran this matrix through the randomForest package in R as follows :
rfr <- randomForest(X_train,Y_train)
Where X_train is the matrix containing the categorical variables and Y__train is a vector consisting of labels for every row in the matrix. When i run this, i get the following error :
Error in y - ymean : non-numeric argument to binary operator
In addition: Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA
I checked for any null values or missing data but didnt find any.
I even made the whole thing into a data.frame and tried the following
rfr <- randomForest(labels ~ ., data = featureDF)
Still had the same errors.
I would appreciate any help with this, thanks!
I'd guess that labels
is a character variable, but randomForest
expects categorical outcome variables to be factors. Change it to a factor and see if the error goes away:
featureDF$labels = factor(featureDF$labels)
The help for randomForest
isn't explicit about the response needing to be a factor, but it's implied:
y A response vector. If a factor, classification is assumed, otherwise regression is assumed. If omitted, randomForest will run in unsupervised mode.
You haven't provided sample data, so here's an example with the built-in iris
data:
Species
is a factor in the original data frame. Let's convert Species
to character:
iris$Species = as.character(iris$Species)
rf <- randomForest(Species ~ ., data=iris)
Error in y - ymean : non-numeric argument to binary operator
After converting Species
back to factor, randomForest
runs without error.
iris$Species = factor(iris$Species)
rf <- randomForest(Species ~ ., data=iris)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With