Logo Questions Linux Laravel Mysql Ubuntu Git Menu

multiclass classification in xgboost (python)

I can't figure out how to pass number of classes or eval metric to xgb.XGBClassifier with the objective function 'multi:softmax'.

I looked at many documentations but the only talk about the sklearn wrapper which accepts n_class/num_class.

My current setup looks like

kf = cross_validation.KFold(y_data.shape[0], \
    n_folds=10, shuffle=True, random_state=30)
err = [] # to hold cross val errors
# xgb instance
xgb_model = xgb.XGBClassifier(n_estimators=_params['n_estimators'], \
    max_depth=params['max_depth'], learning_rate=_params['learning_rate'], \
    min_child_weight=_params['min_child_weight'], \
    subsample=_params['subsample'], \
    colsample_bytree=_params['colsample_bytree'], \
    objective='multi:softmax', nthread=4)

# cv
for train_index, test_index in kf:
    xgb_model.fit(x_data[train_index], y_data[train_index], eval_metric='mlogloss')
    predictions = xgb_model.predict(x_data[test_index])
    actuals = y_data[test_index]
    err.append(metrics.accuracy_score(actuals, predictions))
like image 350
user3804483 Avatar asked Sep 08 '16 09:09


People also ask

Does XGBoost support multiclass classification?

It is more apt for multi-class classification task.

Can we use XGBoost for classification in Python?

XGBoost has frameworks for various languages, including Python, and it integrates nicely with the commonly used scikit-learn machine learning framework used by Python data scientists. It can be used to solve classification and regression problems, so is suitable for the vast majority of common data science challenges.

Is XGBoost good for imbalanced datasets?

XGBoost is an effective machine learning model, even on datasets where the class distribution is skewed. Before any modification or tuning is made to the XGBoost algorithm for imbalanced classification, it is important to test the default XGBoost model and establish a baseline in performance.

1 Answers

You don't need to set num_class in the scikit-learn API for XGBoost classification. It is done automatically when fit is called. Look at xgboost/sklearn.py at the beginning of the fit method of XGBClassifier:

    evals_result = {}
    self.classes_ = np.unique(y)
    self.n_classes_ = len(self.classes_)

    xgb_options = self.get_xgb_params()

    if callable(self.objective):
        obj = _objective_decorator(self.objective)
        # Use default value. Is it really not used ?
        xgb_options["objective"] = "binary:logistic"
        obj = None

    if self.n_classes_ > 2:
        # Switch to using a multiclass objective in the underlying XGB instance
        xgb_options["objective"] = "multi:softprob"
        xgb_options['num_class'] = self.n_classes_
like image 132
Adrien Renaud Avatar answered Sep 26 '22 01:09

Adrien Renaud