Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiclass classification with xgboost classifier?

I am trying out multi-class classification with xgboost and I've built it using this code,

clf = xgb.XGBClassifier(max_depth=7, n_estimators=1000)

clf.fit(byte_train, y_train)
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This gave me some good results. I've got log-loss below 0.7 for my case. But after looking through few pages I've found that we have to use another objective in XGBClassifier for multi-class problem. Here's what is recommended from those pages.

clf = xgb.XGBClassifier(max_depth=5, objective='multi:softprob', n_estimators=1000, 
                        num_classes=9)

clf.fit(byte_train, y_train)  
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This code is also working but it's taking a lot of time to complete compared when to my first code.

Why is my first code also working for multi-class case? I have checked that it's default objective is binary:logistic used for binary classification but it worked really well for multi-class? Which one should I use if both are correct?

like image 253
user_12 Avatar asked Sep 18 '19 06:09

user_12


People also ask

Does XGBoost support multiclass classification?

Final Model Compared to our first iteration of the XGBoost model, we managed to improve slightly in terms of accuracy and micro F1-score. We achieved lower multi class logistic loss and classification error! We see that a high feature importance score is assigned to 'unknown' marital status.

Can we use XGBoost for classification?

XGBoost (eXtreme Gradient Boosting) is a popular supervised-learning algorithm used for regression and classification on large datasets. It uses sequentially-built shallow decision trees to provide accurate results and a highly-scalable training method that avoids overfitting.

Does smote work for multiclass classification?

SMOTE Oversampling for Multi-Class Classification. Oversampling refers to copying or synthesizing new examples of the minority classes so that the number of examples in the minority class better resembles or matches the number of examples in the majority classes.


3 Answers

In fact, even if the default obj parameter of XGBClassifier is binary:logistic, it will internally judge the number of class of label y. When the class number is greater than 2, it will modify the obj parameter to multi:softmax.

https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py

class XGBClassifier(XGBModel, XGBClassifierBase):
    # pylint: disable=missing-docstring,invalid-name,too-many-instance-attributes
    def __init__(self, objective="binary:logistic", **kwargs):
        super().__init__(objective=objective, **kwargs)

    def fit(self, X, y, sample_weight=None, base_margin=None,
            eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True, xgb_model=None,
            sample_weight_eval_set=None, callbacks=None):
        # pylint: disable = attribute-defined-outside-init,arguments-differ

        evals_result = {}
        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)

        xgb_options = self.get_xgb_params()

        if callable(self.objective):
            obj = _objective_decorator(self.objective)
            # Use default value. Is it really not used ?
            xgb_options["objective"] = "binary:logistic"
        else:
            obj = None

        if self.n_classes_ > 2:
            # Switch to using a multiclass objective in the underlying
            # XGB instance
            xgb_options['objective'] = 'multi:softprob'
            xgb_options['num_class'] = self.n_classes_
like image 153
Joey Gao Avatar answered Oct 02 '22 05:10

Joey Gao


By default, XGBClassifier uses the objective='binary:logistic'. When you use this objective, it employs either of these strategies: one-vs-rest (also known as one-vs-all) and one-vs-one. It may not be the right choice for your problem at hand.

When you use objective='multi:softprob', the output is a vector of number of data points * number of classes. As a result, there is an increase in time complexity of your code.

Try setting objective=multi:softmax in your code. It is more apt for multi-class classification task.

like image 44
Saurabh Jain Avatar answered Oct 02 '22 06:10

Saurabh Jain


By default,XGBClassifier or many Classifier uses objective as binary but what it does internally is classifying (one vs rest) i.e. if you have 3 classes it will give result as (0 vs 1&2).If you're dealing with more than 2 classes you should always use softmax.Softmax turns logits into probabilities which will sum to 1.On basis of this,it makes the prediction which classes has the highest probabilities.As you can see the complexity increase as Saurabh mentioned in his answer so it will take more time.

like image 44
Sagar Dubey Avatar answered Oct 02 '22 05:10

Sagar Dubey