How to handle class imbalance in sklearn random forests. Should I use sample weights or class weight parameter

Question

I am trying to solve a binary classification problem with a class imbalance. I have a dataset of 210,000 records in which 92 % are 0s and 8% are 1s. I am using sklearn (v 0.16) in python for random forests .

I see there are two parameters sample_weight and class_weight while constructing the classifier. I am currently using the parameter class_weight="auto".

Am I using this correctly? What does class_weight and sample weight actually do and What should I be using ?

David Maust · Accepted Answer

Class weights are what you should be using.

Sample weights allow you to specify a multiplier for the impact a particular sample has. Weighting a sample with a weight of 2.0 roughly has the same effect as if the point was present twice in the data (although the exact effect is estimator dependent).

Class weights have the same effect, but it used for applying a set multiplier to every sample that falls into the specified class. In terms of functionality, you could use either, but class_weights is provided for convenience so you do not have to manually weight each sample. Also it is possible to combined the usage of the two in which the class weights are multiplied by the sample weights.

One of the main uses for sample_weights on the fit() method is to allow boosting meta-algorithms like AdaBoostClassifier to operate on existing decision tree classifiers and increase or decrease the weights of individual samples as needed by the algorithm.

How to handle class imbalance in sklearn random forests. Should I use sample weights or class weight parameter

Tags:

python

supervised-learning

scikit-learn

random-forest

NG_21

1 Answers

David Maust

Recent Activity

Donate For Us

How to handle class imbalance in sklearn random forests. Should I use sample weights or class weight parameter

Tags:

python

supervised-learning

scikit-learn

random-forest

NG_21

1 Answers

David Maust

Related questions

Recent Activity

Donate For Us