Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGboost python - classifier class weight option?

Is there a way to set different class weights for xgboost classifier? For example in sklearn RandomForestClassifier this is done by the "class_weight" parameter.

like image 680
Fiction Avatar asked Feb 12 '17 19:02

Fiction


People also ask

How do you determine your class weight?

Generating class weights In binary classification, class weights could be represented just by calculating the frequency of the positive and negative class and then inverting it so that when multiplied to the class loss, the underrepresented class has a much higher error than the majority class.

What is scale POS weight XGBoost?

Generally, scale_pos_weight is the ratio of number of negative class to the positive class. Suppose, the dataset has 90 observations of negative class and 10 observations of positive class, then ideal value of scale_pos_weight should be 9. See the doc: http://xgboost.readthedocs.io/en/latest/parameter.html.

Does XGBoost handle class imbalance?

SMOTE is a data approach for an imbalanced classes and XGBoost is one algorithm for an imbalanced data problems. This research uses SMOTE and XGBoost or abbreviated as SMOTEXGBoost for handling data with an imbalanced classes. The results showed almost the same accuracy value between SMOTE and SMOTEXGBoost at 99%.

How do you assign class weights in random forest?

Random Forest With Bootstrap Class Weighting As such, it might be interesting to change the class weighting based on the class distribution in each bootstrap sample, instead of the entire training dataset. This can be achieved by setting the class_weight argument to the value 'balanced_subsample'.


2 Answers

For sklearn version < 0.19

Just assign each entry of your train data its class weight. First get the class weights with class_weight.compute_class_weight of sklearn then assign each row of the train data its appropriate weight.

I assume here that the train data has the column class containing the class number. I assumed also that there are nb_classes that are from 1 to nb_classes.

from sklearn.utils import class_weight
classes_weights = list(class_weight.compute_class_weight('balanced',
                                             np.unique(train_df['class']),
                                             train_df['class']))

weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
    weights[i] = classes_weights[val-1]

xgb_classifier.fit(X, y, sample_weight=weights)

Update for sklearn version >= 0.19

There is simpler solution

from sklearn.utils import class_weight
classes_weights = class_weight.compute_sample_weight(
    class_weight='balanced',
    y=train_df['class']
)

xgb_classifier.fit(X, y, sample_weight=classes_weights)
like image 55
Firas Omrane Avatar answered Sep 19 '22 08:09

Firas Omrane


from sklearn.utils.class_weight import compute_sample_weight
xgb_classifier.fit(X, y, sample_weight=compute_sample_weight("balanced", y))
like image 29
Tianhuang Su Avatar answered Sep 19 '22 08:09

Tianhuang Su