understanding python xgboost cv

Tags:

I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which of the options for the max_depth parameter ([2,4,6]) was determined optimal?

from sklearn.datasets import load_iris
import xgboost as xgb
iris = load_iris()
DTrain = xgb.DMatrix(iris.data, iris.target)
x_parameters = {"max_depth":[2,4,6]}
xgb.cv(x_parameters, DTrain)
...
Out[6]: 
   test-rmse-mean  test-rmse-std  train-rmse-mean  train-rmse-std
0        0.888435       0.059403         0.888052        0.022942
1        0.854170       0.053118         0.851958        0.017982
2        0.837200       0.046986         0.833532        0.015613
3        0.829001       0.041960         0.824270        0.014501
4        0.825132       0.038176         0.819654        0.013975
5        0.823357       0.035454         0.817363        0.013722
6        0.822580       0.033540         0.816229        0.013598
7        0.822265       0.032209         0.815667        0.013538
8        0.822158       0.031287         0.815390        0.013508
9        0.822140       0.030647         0.815252        0.013494

576

asked Dec 26 '15 06:12

kilojoules

2 Answers

You can use GridSearchCV with xgboost through xgboost sklearn API

Define your classifier as follows:

from xgboost.sklearn import XGBClassifier
from sklearn.grid_search import GridSearchCV 

xgb_model = XGBClassifier(other_params)

test_params = {
 'max_depth':[4,8,12]
}

model = GridSearchCV(estimator = xgb_model,param_grid = test_params)
model.fit(train,target)
print model.best_params_

198

answered Sep 27 '22 18:09

Rohit

Cross-validation is used for estimating the performance of one set of parameters on unseen data.

Grid-search evaluates a model with varying parameters to find the best possible combination of these.

The sklearn docs talks a lot about CV, and they can be used in combination, but they each have very different purposes.

You might be able to fit xgboost into sklearn's gridsearch functionality. Check out the sklearn interface to xgboost for the most smooth application.

answered Sep 27 '22 17:09

Aske Doerge

Related questions
                            
                                twisted: catch keyboardinterrupt and shutdown properly
                            
                                Python 2.6.5: Divide timedelta with timedelta
                            
                                Java: automatic memoization
                            
                                Testing a custom Django template filter
                            
                                Python shorthand conditional
                            
                                Don't understand this python For loop
                            
                                Bit masking in Python
                            
                                Python: how to store a numpy multidimensional array in PyTables?
                            
                                How do I unpack a list with fewer variables?
                            
                                Problems with Jinja2: TemplateNotFound: index.html
                            
                                Using the SQLAlchemy ORM inside an Alembic migration: how do I?
                            
                                simple graphics for python
                            
                                Replace all text between 2 strings python
                            
                                Showing Pandas data frame as a table
                            
                                How to assign to a Django PointField model attribute?
                            
                                Django email with smtp.gmail SMTPAuthenticationError 534 Application-specific password required
                            
                                Percentage Overlap of Two Lists
                            
                                PySpark add a column to a DataFrame from a TimeStampType column
                            
                                Python Click command names
                            
                                Generate random number outside of range in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

understanding python xgboost cv

Tags:

python

xgboost

cross-validation

kilojoules

People also ask

2 Answers

Rohit

Aske Doerge

Recent Activity

Donate For Us