My dataset has 90% negative samples and 10% positive samples which is very imbalanced. I try to use the parameter of scale_pos_weight and set it as 9. What is the mechanism of this param do. I am curious about what it actually means: does it mean repeat the positive samples 9 times? Or everytime pull out 1/9 samples of negative samples and train the model in many times. besides, if I have a dataset whose negative samples just a little more than the positive ones, should I need to specify the parameter again?
scale_pos_weight
in xgboost is just used to multiply the weights as is source code. There is no subsampling done based on this parameter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With