i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task. I am using XGBoost via its Scikit-Learn API. <pre class="prettyprint"><code> model = xgb.XGBRegressor() GridSearchCV(model, paramGrid, verbose=verbose ,fit_params={'early_stopping_rounds':42}, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]), n_jobs=n_jobs, iid=iid).fit(trainX,trainY) </code></pre> I tried to give early stopping parameters with using fit_params, but then it throws this error which is basically because of lack of validation set which is required for early stopping: <pre class="prettyprint"><code>/opt/anaconda/anaconda3/lib/python3.5/site-packages/xgboost/callback.py in callback(env=XGBoostCallbackEnv(model=<xgboost.core.Booster o...teration=4000, rank=0, evaluation_result_list=[])) 187 else: 188 assert env.cvfolds is not None 189 190 def callback(env): 191 """internal function""" --> 192 score = env.evaluation_result_list[-1][1] score = undefined env.evaluation_result_list = [] 193 if len(state) == 0: 194 init(env) 195 best_score = state['best_score'] 196 best_iteration = state['best_iteration'] </code></pre> How can i apply GridSearch on XGBoost with using early_stopping_rounds? note: model is working without gridsearch, also GridSearch works without 'fit_params={'early_stopping_rounds':42}

When using <code>early_stopping_rounds</code> you also have to give <code>eval_metric</code> and <code>eval_set</code> as input parameter for the fit method. Early stopping is done via calculating the error on an evaluation set. The error has to decrease every <code>early_stopping_rounds</code> otherwise the generation of additional trees is stopped early. See the documentation of xgboosts fit method for details. Here you see a minimal fully working example: <pre class="prettyprint"><code>import xgboost as xgb from sklearn.model_selection import GridSearchCV from sklearn.model_selection import TimeSeriesSplit cv = 2 trainX= [[1], [2], [3], [4], [5]] trainY = [1, 2, 3, 4, 5] # these are the evaluation sets testX = trainX testY = trainY paramGrid = {"subsample" : [0.5, 0.8]} fit_params={"early_stopping_rounds":42, "eval_metric" : "mae", "eval_set" : [[testX, testY]]} model = xgb.XGBRegressor() gridsearch = GridSearchCV(model, paramGrid, verbose=1 , fit_params=fit_params, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX,trainY])) gridsearch.fit(trainX,trainY) </code></pre>

GridSearchCV - XGBoost - Early Stopping

Tags:

python-3.x

scikit-learn

regression

data-science

xgboost

i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task. I am using XGBoost via its Scikit-Learn API.

    model = xgb.XGBRegressor()     GridSearchCV(model, paramGrid, verbose=verbose ,fit_params={'early_stopping_rounds':42}, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]), n_jobs=n_jobs, iid=iid).fit(trainX,trainY)

I tried to give early stopping parameters with using fit_params, but then it throws this error which is basically because of lack of validation set which is required for early stopping:

/opt/anaconda/anaconda3/lib/python3.5/site-packages/xgboost/callback.py in callback(env=XGBoostCallbackEnv(model=<xgboost.core.Booster o...teration=4000, rank=0, evaluation_result_list=[]))     187         else:     188             assert env.cvfolds is not None     189      190     def callback(env):     191         """internal function""" --> 192         score = env.evaluation_result_list[-1][1]         score = undefined         env.evaluation_result_list = []     193         if len(state) == 0:     194             init(env)     195         best_score = state['best_score']     196         best_iteration = state['best_iteration']

How can i apply GridSearch on XGBoost with using early_stopping_rounds?

note: model is working without gridsearch, also GridSearch works without 'fit_params={'early_stopping_rounds':42}

673

asked Mar 24 '17 07:03

ayyayyekokojambo

2 Answers

When using early_stopping_rounds you also have to give eval_metric and eval_set as input parameter for the fit method. Early stopping is done via calculating the error on an evaluation set. The error has to decrease every early_stopping_rounds otherwise the generation of additional trees is stopped early.

See the documentation of xgboosts fit method for details.

Here you see a minimal fully working example:

import xgboost as xgb from sklearn.model_selection import GridSearchCV from sklearn.model_selection import TimeSeriesSplit  cv = 2  trainX= [[1], [2], [3], [4], [5]] trainY = [1, 2, 3, 4, 5]  # these are the evaluation sets testX = trainX  testY = trainY  paramGrid = {"subsample" : [0.5, 0.8]}  fit_params={"early_stopping_rounds":42,              "eval_metric" : "mae",              "eval_set" : [[testX, testY]]}  model = xgb.XGBRegressor() gridsearch = GridSearchCV(model, paramGrid, verbose=1 ,          fit_params=fit_params,          cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX,trainY])) gridsearch.fit(trainX,trainY)

answered Oct 07 '22 23:10

glao

An update to @glao's answer and a response to @Vasim's comment/question, as of sklearn 0.21.3 (note that fit_params has been moved out of the instantiation of GridSearchCV and been moved into the fit() method; also, the import specifically pulls in the sklearn wrapper module from xgboost):

import xgboost.sklearn as xgb from sklearn.model_selection import GridSearchCV from sklearn.model_selection import TimeSeriesSplit  cv = 2  trainX= [[1], [2], [3], [4], [5]] trainY = [1, 2, 3, 4, 5]  # these are the evaluation sets testX = trainX  testY = trainY  paramGrid = {"subsample" : [0.5, 0.8]}  fit_params={"early_stopping_rounds":42,              "eval_metric" : "mae",              "eval_set" : [[testX, testY]]}  model = xgb.XGBRegressor()  gridsearch = GridSearchCV(model, paramGrid, verbose=1,                       cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]))  gridsearch.fit(trainX, trainY, **fit_params)

answered Oct 08 '22 00:10

emigre459

Related questions
                            
                                Hide axis label only, not entire axis, in Pandas plot
                            
                                Python 3 Get HTTP page
                            
                                Type hint for NumPy ndarray dtype?
                            
                                Py2exe for Python 3.0
                            
                                HTTP requests.post timeout
                            
                                When I use Google Colaboratory, how to save image, weights in my Google Drive?
                            
                                Find all combinations of a list of numbers with a given sum
                            
                                Wait page to load before getting data with requests.get in python 3
                            
                                Argparse optional boolean [duplicate]
                            
                                How to run different python versions in cmd [duplicate]
                            
                                How to write parquet file from pandas dataframe in S3 in python
                            
                                Virtualenv - Python 3 - Ubuntu 14.04 64 bit
                            
                                What does {0} mean in this Python string?
                            
                                What is the purpose of Python's itertools.repeat?
                            
                                Install virtualenv and virtualenvwrapper on MacOS
                            
                                Print empty line?
                            
                                aiogevent event loop "fails" to track greenlets
                            
                                In Bokeh, how do I add tooltips to a Timeseries chart (hover tool)?
                            
                                Most Pythonic way to declare an abstract class property
                            
                                Customize module search path (PYTHONPATH) via pipenv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With