Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understanding python xgboost cv

I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which of the options for the max_depth parameter ([2,4,6]) was determined optimal?

from sklearn.datasets import load_iris
import xgboost as xgb
iris = load_iris()
DTrain = xgb.DMatrix(iris.data, iris.target)
x_parameters = {"max_depth":[2,4,6]}
xgb.cv(x_parameters, DTrain)
...
Out[6]: 
   test-rmse-mean  test-rmse-std  train-rmse-mean  train-rmse-std
0        0.888435       0.059403         0.888052        0.022942
1        0.854170       0.053118         0.851958        0.017982
2        0.837200       0.046986         0.833532        0.015613
3        0.829001       0.041960         0.824270        0.014501
4        0.825132       0.038176         0.819654        0.013975
5        0.823357       0.035454         0.817363        0.013722
6        0.822580       0.033540         0.816229        0.013598
7        0.822265       0.032209         0.815667        0.013538
8        0.822158       0.031287         0.815390        0.013508
9        0.822140       0.030647         0.815252        0.013494
like image 576
kilojoules Avatar asked Dec 26 '15 06:12

kilojoules


People also ask

How does XGBoost CV work?

XGBoost has a very useful function called as “cv” which performs cross-validation at each boosting iteration and thus returns the optimum number of trees required. Tune tree-specific parameters ( max_depth, min_child_weight, gamma, subsample, colsample_bytree) for decided learning rate and number of trees.

How do you do cross-validation in XGBoost?

Another way to perform cross-validation with XGBoost is to use XGBoost's own non-Scikit-learn compatible API. “Non-Scikit-learn compatible” means that here we do not use the Scikit-learn cross_val_score() function, instead we use XGBoost's cv() function with explicitly created DMatrices.

Does XGBoost cross-validation?

Wide variety of tuning parameters : XGBoost internally has parameters for cross-validation, regularization, user-defined objective functions, missing values, tree parameters, scikit-learn compatible API etc.

What is Nrounds in XGBoost?

nrounds : the number of decision trees in the final model. objective : the training objective to use, where “binary:logistic” means a binary classifier.


2 Answers

You can use GridSearchCV with xgboost through xgboost sklearn API

Define your classifier as follows:

from xgboost.sklearn import XGBClassifier
from sklearn.grid_search import GridSearchCV 

xgb_model = XGBClassifier(other_params)

test_params = {
 'max_depth':[4,8,12]
}

model = GridSearchCV(estimator = xgb_model,param_grid = test_params)
model.fit(train,target)
print model.best_params_
like image 198
Rohit Avatar answered Sep 27 '22 18:09

Rohit


Cross-validation is used for estimating the performance of one set of parameters on unseen data.

Grid-search evaluates a model with varying parameters to find the best possible combination of these.

The sklearn docs talks a lot about CV, and they can be used in combination, but they each have very different purposes.

You might be able to fit xgboost into sklearn's gridsearch functionality. Check out the sklearn interface to xgboost for the most smooth application.

like image 42
Aske Doerge Avatar answered Sep 27 '22 17:09

Aske Doerge