Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGBoost for multilabel classification?

Tags:

Is it possible to use XGBoost for multi-label classification? Now I use OneVsRestClassifier over GradientBoostingClassifier from sklearn. It works, but use only one core from my CPU. In my data I have ~45 features and the task is to predict about 20 columns with binary (boolean) data. Metric is mean average precision (map@7). If you have a short example of code to share, that would be great.

like image 429
user3318023 Avatar asked Dec 01 '16 17:12

user3318023


People also ask

Does XGBoost support Multilabel?

Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels.

Can we use XGBoost for multi class classification?

Compared to our first iteration of the XGBoost model, we managed to improve slightly in terms of accuracy and micro F1-score. We achieved lower multi class logistic loss and classification error! We see that a high feature importance score is assigned to 'unknown' marital status.

Can XGBoost be used for text classification?

XGBoost is the name of a machine learning method. It can help you to predict any kind of data if you have already predicted data before. You can classify any kind of data. It can be used for text classification too.

Is XGBoost good for binary classification?

This modified version of XGBoost is referred to as Class Weighted XGBoost or Cost-Sensitive XGBoost and can offer better performance on binary classification problems with a severe class imbalance.


1 Answers

One possible approach, instead of using OneVsRestClassifier which is for multi-class tasks, is to use MultiOutputClassifier from the sklearn.multioutput module.

Below is a small reproducible sample code with the number of input features and target outputs requested by the OP

import xgboost as xgb from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import train_test_split from sklearn.multioutput import MultiOutputClassifier from sklearn.metrics import accuracy_score  # create sample dataset X, y = make_multilabel_classification(n_samples=3000, n_features=45, n_classes=20, n_labels=1,                                       allow_unlabeled=False, random_state=42)  # split dataset into training and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)  # create XGBoost instance with default hyper-parameters xgb_estimator = xgb.XGBClassifier(objective='binary:logistic')  # create MultiOutputClassifier instance with XGBoost model inside multilabel_model = MultiOutputClassifier(xgb_estimator)  # fit the model multilabel_model.fit(X_train, y_train)  # evaluate on test data print('Accuracy on test data: {:.1f}%'.format(accuracy_score(y_test, multilabel_model.predict(X_test))*100)) 
like image 120
Ric S Avatar answered Sep 28 '22 13:09

Ric S