I can't figure out how to pass number of classes or eval metric to xgb.XGBClassifier with the objective function 'multi:softmax'.
I looked at many documentations but the only talk about the sklearn wrapper which accepts n_class/num_class.
My current setup looks like
kf = cross_validation.KFold(y_data.shape[0], \
n_folds=10, shuffle=True, random_state=30)
err = [] # to hold cross val errors
# xgb instance
xgb_model = xgb.XGBClassifier(n_estimators=_params['n_estimators'], \
max_depth=params['max_depth'], learning_rate=_params['learning_rate'], \
min_child_weight=_params['min_child_weight'], \
subsample=_params['subsample'], \
colsample_bytree=_params['colsample_bytree'], \
objective='multi:softmax', nthread=4)
# cv
for train_index, test_index in kf:
xgb_model.fit(x_data[train_index], y_data[train_index], eval_metric='mlogloss')
predictions = xgb_model.predict(x_data[test_index])
actuals = y_data[test_index]
err.append(metrics.accuracy_score(actuals, predictions))
It is more apt for multi-class classification task.
XGBoost has frameworks for various languages, including Python, and it integrates nicely with the commonly used scikit-learn machine learning framework used by Python data scientists. It can be used to solve classification and regression problems, so is suitable for the vast majority of common data science challenges.
XGBoost is an effective machine learning model, even on datasets where the class distribution is skewed. Before any modification or tuning is made to the XGBoost algorithm for imbalanced classification, it is important to test the default XGBoost model and establish a baseline in performance.
You don't need to set num_class
in the scikit-learn API for XGBoost classification. It is done automatically when fit
is called. Look at xgboost/sklearn.py at the beginning of the fit
method of XGBClassifier
:
evals_result = {}
self.classes_ = np.unique(y)
self.n_classes_ = len(self.classes_)
xgb_options = self.get_xgb_params()
if callable(self.objective):
obj = _objective_decorator(self.objective)
# Use default value. Is it really not used ?
xgb_options["objective"] = "binary:logistic"
else:
obj = None
if self.n_classes_ > 2:
# Switch to using a multiclass objective in the underlying XGB instance
xgb_options["objective"] = "multi:softprob"
xgb_options['num_class'] = self.n_classes_
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With