Is it possible to use XGBoost for multi-label classification? Now I use OneVsRestClassifier
over GradientBoostingClassifier
from sklearn
. It works, but use only one core from my CPU. In my data I have ~45 features and the task is to predict about 20 columns with binary (boolean) data. Metric is mean average precision (map@7). If you have a short example of code to share, that would be great.
Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels.
Compared to our first iteration of the XGBoost model, we managed to improve slightly in terms of accuracy and micro F1-score. We achieved lower multi class logistic loss and classification error! We see that a high feature importance score is assigned to 'unknown' marital status.
XGBoost is the name of a machine learning method. It can help you to predict any kind of data if you have already predicted data before. You can classify any kind of data. It can be used for text classification too.
This modified version of XGBoost is referred to as Class Weighted XGBoost or Cost-Sensitive XGBoost and can offer better performance on binary classification problems with a severe class imbalance.
One possible approach, instead of using OneVsRestClassifier
which is for multi-class tasks, is to use MultiOutputClassifier
from the sklearn.multioutput
module.
Below is a small reproducible sample code with the number of input features and target outputs requested by the OP
import xgboost as xgb from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import train_test_split from sklearn.multioutput import MultiOutputClassifier from sklearn.metrics import accuracy_score # create sample dataset X, y = make_multilabel_classification(n_samples=3000, n_features=45, n_classes=20, n_labels=1, allow_unlabeled=False, random_state=42) # split dataset into training and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123) # create XGBoost instance with default hyper-parameters xgb_estimator = xgb.XGBClassifier(objective='binary:logistic') # create MultiOutputClassifier instance with XGBoost model inside multilabel_model = MultiOutputClassifier(xgb_estimator) # fit the model multilabel_model.fit(X_train, y_train) # evaluate on test data print('Accuracy on test data: {:.1f}%'.format(accuracy_score(y_test, multilabel_model.predict(X_test))*100))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With