Is there a way to set different class weights for xgboost classifier? For example in sklearn RandomForestClassifier this is done by the "class_weight" parameter.
Generating class weights In binary classification, class weights could be represented just by calculating the frequency of the positive and negative class and then inverting it so that when multiplied to the class loss, the underrepresented class has a much higher error than the majority class.
Generally, scale_pos_weight is the ratio of number of negative class to the positive class. Suppose, the dataset has 90 observations of negative class and 10 observations of positive class, then ideal value of scale_pos_weight should be 9. See the doc: http://xgboost.readthedocs.io/en/latest/parameter.html.
SMOTE is a data approach for an imbalanced classes and XGBoost is one algorithm for an imbalanced data problems. This research uses SMOTE and XGBoost or abbreviated as SMOTEXGBoost for handling data with an imbalanced classes. The results showed almost the same accuracy value between SMOTE and SMOTEXGBoost at 99%.
Random Forest With Bootstrap Class Weighting As such, it might be interesting to change the class weighting based on the class distribution in each bootstrap sample, instead of the entire training dataset. This can be achieved by setting the class_weight argument to the value 'balanced_subsample'.
For sklearn version < 0.19
Just assign each entry of your train data its class weight. First get the class weights with class_weight.compute_class_weight
of sklearn then assign each row of the train data its appropriate weight.
I assume here that the train data has the column class
containing the class number. I assumed also that there are nb_classes
that are from 1 to nb_classes
.
from sklearn.utils import class_weight
classes_weights = list(class_weight.compute_class_weight('balanced',
np.unique(train_df['class']),
train_df['class']))
weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
weights[i] = classes_weights[val-1]
xgb_classifier.fit(X, y, sample_weight=weights)
Update for sklearn version >= 0.19
There is simpler solution
from sklearn.utils import class_weight
classes_weights = class_weight.compute_sample_weight(
class_weight='balanced',
y=train_df['class']
)
xgb_classifier.fit(X, y, sample_weight=classes_weights)
from sklearn.utils.class_weight import compute_sample_weight
xgb_classifier.fit(X, y, sample_weight=compute_sample_weight("balanced", y))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With