The help page for randomforest::randomforest()
says:
"classwt - Priors of the classes. Need not add up to one. Ignored for regression."
Could setting the classwt
parameter help when you have heavy unbalanced data, ie. priors of classes differs strongly ?
How should I set classwt
when training a model on a dataset with 3 classes with a vector of priors equal to (p1,p2,p3), and in test set priors are (q1,q2,q3)?
The R package "randomForest" is used to create random forests.
ntree : number of trees. We want enough trees to stabalize the error but using too many trees is unncessarily inefficient, especially when using large data sets. mtry : the number of variables to randomly sample as candidates at each split. When mtry =p.
could setting classwt parameter help when you have heavy unbalanced data - priors of classes differs strongly?
Yes, setting values of classwt could be useful for unbalanced datasets. And I agree with joran, that these values are trasformed in probabilities for sampling training data (according Breiman's arguments in his original article).
How set classwt when in training dataset with 3 classes you have vector of priors equal to (p1,p2,p3), and in test set priors are (q1,q2,q3)?
For training you can simply specify
rf <- randomForest(x=x, y=y, classwt=c(p1,p2,p3))
For test set no priors can be used: 1) there is no such option in predict
method of randomForest package; 2) weights have only sense for training of the model and not for prediction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With