How can I use the R randomForest
package with observation weights? I know that there is no such option in this package. I have 2 questions:
Are there any solutions to this problem using randomForest
package? At this moment I'm drawing samples from data with weights as the probability so I can at least simulate it:
m = dim(data)[1]
sample(data, m, replace=TRUE, prob=weights)
It works are there other (better) solutions?
Are there any alternatives to the randomForest
package. I found the party
package (cforest) but it's terrible in terms of memory management (or I cannot use it the way I use randomForest
package). I have around 200k observations and 30-40 variables.
EDIT:
Sorry for not clarifying details. I'm using the randomForest
package for regression problem (not classification). It is a time series and every observation has its weight. Later on this weight is used to determine the model performance across test observations. The y variable is continuous.
I was looking for the same option as you Pawel in the Random Forest. And I figured out the package "ranger" in R incorporates it in the function "ranger" (through the parameter "case.weights").
The package released in june 2016 so it is very young.
Best,
randomForest
does have a "classwt" parameter that should allow you to account for differential sampling probabilities or even for differential costs. Admittedly it is ignored with regression Perhaps you should explain why you need to use weighting and what sort of y variable you are using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With