I am trying out multi-class classification with xgboost and I've built it using this code, <pre class="prettyprint"><code>clf = xgb.XGBClassifier(max_depth=7, n_estimators=1000) clf.fit(byte_train, y_train) train1 = clf.predict_proba(train_data) test1 = clf.predict_proba(test_data) </code></pre> This gave me some good results. I've got log-loss below 0.7 for my case. But after looking through few pages I've found that we have to use another objective in XGBClassifier for multi-class problem. Here's what is recommended from those pages. <pre class="prettyprint"><code>clf = xgb.XGBClassifier(max_depth=5, objective='multi:softprob', n_estimators=1000, num_classes=9) clf.fit(byte_train, y_train) train1 = clf.predict_proba(train_data) test1 = clf.predict_proba(test_data) </code></pre> This code is also working but it's taking a lot of time to complete compared when to my first code. Why is my first code also working for multi-class case? I have checked that it's default objective is binary:logistic used for binary classification but it worked really well for multi-class? Which one should I use if both are correct?

By default, XGBClassifier uses the <code>objective='binary:logistic'</code>. When you use this objective, it employs either of these strategies: <code>one-vs-rest</code> (also known as one-vs-all) and <code>one-vs-one</code>. It may not be the right choice for your problem at hand. When you use <code>objective='multi:softprob'</code>, the output is a vector of number of data points * number of classes. As a result, there is an increase in time complexity of your code. Try setting <code>objective=multi:softmax</code> in your code. It is more apt for multi-class classification task.

By default,XGBClassifier or many Classifier uses objective as binary but what it does internally is classifying (one vs rest) i.e. if you have 3 classes it will give result as (0 vs 1&2).If you're dealing with more than 2 classes you should always use softmax.Softmax turns logits into probabilities which will sum to 1.On basis of this,it makes the prediction which classes has the highest probabilities.As you can see the complexity increase as Saurabh mentioned in his answer so it will take more time.

Multiclass classification with xgboost classifier?

Tags:

python

machine-learning

scikit-learn

xgboost

I am trying out multi-class classification with xgboost and I've built it using this code,

clf = xgb.XGBClassifier(max_depth=7, n_estimators=1000)

clf.fit(byte_train, y_train)
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This gave me some good results. I've got log-loss below 0.7 for my case. But after looking through few pages I've found that we have to use another objective in XGBClassifier for multi-class problem. Here's what is recommended from those pages.

clf = xgb.XGBClassifier(max_depth=5, objective='multi:softprob', n_estimators=1000, 
                        num_classes=9)

clf.fit(byte_train, y_train)  
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)

This code is also working but it's taking a lot of time to complete compared when to my first code.

Why is my first code also working for multi-class case? I have checked that it's default objective is binary:logistic used for binary classification but it worked really well for multi-class? Which one should I use if both are correct?

253

asked Sep 18 '19 06:09

user_12

3 Answers

In fact, even if the default obj parameter of XGBClassifier is binary:logistic, it will internally judge the number of class of label y. When the class number is greater than 2, it will modify the obj parameter to multi:softmax.

https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py

class XGBClassifier(XGBModel, XGBClassifierBase):
    # pylint: disable=missing-docstring,invalid-name,too-many-instance-attributes
    def __init__(self, objective="binary:logistic", **kwargs):
        super().__init__(objective=objective, **kwargs)

    def fit(self, X, y, sample_weight=None, base_margin=None,
            eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True, xgb_model=None,
            sample_weight_eval_set=None, callbacks=None):
        # pylint: disable = attribute-defined-outside-init,arguments-differ

        evals_result = {}
        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)

        xgb_options = self.get_xgb_params()

        if callable(self.objective):
            obj = _objective_decorator(self.objective)
            # Use default value. Is it really not used ?
            xgb_options["objective"] = "binary:logistic"
        else:
            obj = None

        if self.n_classes_ > 2:
            # Switch to using a multiclass objective in the underlying
            # XGB instance
            xgb_options['objective'] = 'multi:softprob'
            xgb_options['num_class'] = self.n_classes_

153

answered Oct 02 '22 05:10

Joey Gao

By default, XGBClassifier uses the objective='binary:logistic'. When you use this objective, it employs either of these strategies: one-vs-rest (also known as one-vs-all) and one-vs-one. It may not be the right choice for your problem at hand.

When you use objective='multi:softprob', the output is a vector of number of data points * number of classes. As a result, there is an increase in time complexity of your code.

Try setting objective=multi:softmax in your code. It is more apt for multi-class classification task.

answered Oct 02 '22 06:10

Saurabh Jain

By default,XGBClassifier or many Classifier uses objective as binary but what it does internally is classifying (one vs rest) i.e. if you have 3 classes it will give result as (0 vs 1&2).If you're dealing with more than 2 classes you should always use softmax.Softmax turns logits into probabilities which will sum to 1.On basis of this,it makes the prediction which classes has the highest probabilities.As you can see the complexity increase as Saurabh mentioned in his answer so it will take more time.

answered Oct 02 '22 05:10

Sagar Dubey

Related questions
                            
                                What encoding do normal python strings use?
                            
                                python: APNs SSLError
                            
                                URL building with Flask and non-unique handler names
                            
                                Django-compressor: how to write to S3, read from CloudFront?
                            
                                match dates using python regular expressions
                            
                                Django load local json file
                            
                                Matplotlib plots not displaying in sublimetext
                            
                                Using flask extensions in flask blueprints
                            
                                Getting next line in a file
                            
                                NLTK: set proxy server
                            
                                is there a better way to handle index.html with Tornado?
                            
                                Saving a json file to computer python
                            
                                How to get the current checked out Git branch name through pygit2?
                            
                                Scrapy pipeline to export csv file in the right format
                            
                                FindContours support only 8uC1 and 32sC1 images
                            
                                How can I share a variable between functions in Python?
                            
                                Calculate the area of intersection of two rotated rectangles in python
                            
                                Difference between methods and attributes in python
                            
                                Flask-Migrate No Changes Detected to Schema on first migration
                            
                                Merge multiple dataframes based on a common column [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With