When trying to test my trained model on new test data that has fewer factor levels than my training data, predict()
returns the following:
Type of predictors in new data do not match that of the training data.
My training data has a variable with 7 factor levels and my test data has that same variable with 6 factor levels (all 6 ARE in the training data).
When I add an observation containing the "missing" 7th factor, the model runs, so I'm not sure why this happens or even the logic behind it.
I could see if the test set had more/different factor levels, then randomForest would choke, but why in the case where training set has "more" data?
R expects both the training and the test data to have the exact same levels (even if one of the sets has no observations for a given level or levels). In your case, since the test dataset is missing a level that the train has, you can do
test$val <- factor(test$val, levels=levels(train$val))
to make sure it has all the same levels and they are coded the same say.
(reposted here to close out the question)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With