Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incorporating observation weights in the randomForest package

How can I use the R randomForest package with observation weights? I know that there is no such option in this package. I have 2 questions:

  1. Are there any solutions to this problem using randomForest package? At this moment I'm drawing samples from data with weights as the probability so I can at least simulate it:

    m = dim(data)[1]
    sample(data, m, replace=TRUE, prob=weights)
    

    It works are there other (better) solutions?

  2. Are there any alternatives to the randomForest package. I found the party package (cforest) but it's terrible in terms of memory management (or I cannot use it the way I use randomForest package). I have around 200k observations and 30-40 variables.

EDIT:

Sorry for not clarifying details. I'm using the randomForest package for regression problem (not classification). It is a time series and every observation has its weight. Later on this weight is used to determine the model performance across test observations. The y variable is continuous.

like image 892
Pawel Avatar asked Mar 25 '12 12:03

Pawel


2 Answers

I was looking for the same option as you Pawel in the Random Forest. And I figured out the package "ranger" in R incorporates it in the function "ranger" (through the parameter "case.weights").

The package released in june 2016 so it is very young.

Best,

like image 73
Ooona Avatar answered Nov 15 '22 12:11

Ooona


randomForest does have a "classwt" parameter that should allow you to account for differential sampling probabilities or even for differential costs. Admittedly it is ignored with regression Perhaps you should explain why you need to use weighting and what sort of y variable you are using.

like image 26
IRTFM Avatar answered Nov 15 '22 12:11

IRTFM