here below is a question about xgboost early stopping rounds parameter and how it does, or does not, give the best iteration when it is the reason why the fit ends. In xgboost documentation, one can see in the scikit learn api section (link) that when the fit stops due to the early stopping rounds parameter: <blockquote> Activates early stopping. Validation error needs to decrease at least every "early_stopping_rounds" round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last. Returns the model from the last iteration (not the best one). </blockquote> When reeding this, it seems that the model returned, in this case, is not the best one but the last one. To access the best one when predict, it says, it is possible to call the predict using the ntree_limit parameter with the bst.best_ntree_limit given at the end of the fit. In this sense, it should work the same way as the train of xgboost since the fit of the scikitlearn api seems to be only an embedding of the train and others. It is wiedly discussed here stack overflow discussion or here another discussion But when I tried to address this problem and check how it worked with my data, I did not find the behavior that I thought I should have. In fact the behavior I encountered was not at all the one discribed in those discussions and documentation. I call a fit this way: reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 100, max_depth= 5) <pre class="prettyprint"><code>reg.fit( X_train, y_train, eval_metric='rmse', eval_set=[(X_train, y_train), (X_valid, y_valid)], verbose=True, early_stopping_rounds = 6) </code></pre> and here is what I get in the end: <pre class="prettyprint"><code>[71] validation_0-rmse:1.70071 validation_1-rmse:1.9382 [72] validation_0-rmse:1.69806 validation_1-rmse:1.93825 [73] validation_0-rmse:1.69732 validation_1-rmse:1.93803 Stopping. Best iteration: [67] validation_0-rmse:1.70768 validation_1-rmse:1.93734 </code></pre> and when I check the values of the validation I used : <pre class="prettyprint"><code>y_pred_valid = reg.predict(X_valid) y_pred_valid_df = pd.DataFrame(y_pred_valid) sqrt(mse(y_valid, y_pred_valid_df[0])) </code></pre> I get <blockquote> 1.9373418403889535 </blockquote> If the fit had return the last iteration instead of the best one it should have given an rmse around 1.93803 but it gave an rmse at 1.93734, exactly the best score. I checked again by two ways: [Edit] I've edited the code below according to @Eran Moshe answer <pre class="prettyprint"><code>y_pred_valid = reg.predict(X_valid, ntree_limit=reg.best_ntree_limit) y_pred_valid_df = pd.DataFrame(y_pred_valid) sqrt(mse(y_valid, y_pred_valid_df[0])) </code></pre> <blockquote> 1.9373418403889535 </blockquote> and even if I call the fit (knowing the best iter is the 67th) with only 68 estimators so that I'm sure the last one is the best one: <pre class="prettyprint"><code>reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 68, max_depth= 5) reg.fit( X_train, y_train, eval_metric='rmse', eval_set=[(X_train, y_train), (X_valid, y_valid)], verbose=True, early_stopping_rounds = 6) </code></pre> the result is the same: <blockquote> 1.9373418403889535 </blockquote> So that seems to lead to the idea that, unlike the documentation, and those numerous discussions about it, tell, the fit of xgboost, when stopped by the early stopping round parameter, does give the best iter, not the last one. Am I wrong, if so, where, and how do you explain the behavior I met ? Thanks for the attention

I think, it is not wrong, but inconsistent. The documentation of the <code>predict</code> method is correct (e.g. see here). To bee 100% sure it is better to look into the code: xgb github, so <code>predict</code> behaves as is stated in it's documentation, but the <code>fit</code> documentation is outdated. Please, post it as an issue on XGB github and either they will fix the docs or you will and will become an XGB contributer :)

You have a code error there. Notice how <pre class="prettyprint"><code>reg.predict(X_valid, ntree_limit=reg.best_ntree_limit) </code></pre> Should be <pre class="prettyprint"><code>y_pred_valid = reg.predict(X_valid, ntree_limit=reg.best_ntree_limit) </code></pre> So in fact you're making the same comparison, when calculating <pre class="prettyprint"><code>sqrt(mse(y_valid, y_pred_valid_df[0])) </code></pre> <hr> Xgboost is working just as you've read. <code>early_stopping_round = x</code> will train until it didn't improve for <code>x</code> consecutive rounds. And when predicting with <code>ntree_limit=y</code> it'll use ONLY the first <code>y</code> Boosters.

Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)

Tags:

python

machine-learning

scikit-learn

xgboost

here below is a question about xgboost early stopping rounds parameter and how it does, or does not, give the best iteration when it is the reason why the fit ends.

In xgboost documentation, one can see in the scikit learn api section (link) that when the fit stops due to the early stopping rounds parameter:

Activates early stopping. Validation error needs to decrease at least every "early_stopping_rounds" round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last. Returns the model from the last iteration (not the best one).

When reeding this, it seems that the model returned, in this case, is not the best one but the last one. To access the best one when predict, it says, it is possible to call the predict using the ntree_limit parameter with the bst.best_ntree_limit given at the end of the fit.

In this sense, it should work the same way as the train of xgboost since the fit of the scikitlearn api seems to be only an embedding of the train and others.

It is wiedly discussed here stack overflow discussion or here another discussion

But when I tried to address this problem and check how it worked with my data, I did not find the behavior that I thought I should have. In fact the behavior I encountered was not at all the one discribed in those discussions and documentation.

I call a fit this way:

reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 100, max_depth= 5)

reg.fit(
   X_train, 
   y_train, 
   eval_metric='rmse',    
   eval_set=[(X_train, y_train), (X_valid, y_valid)],
   verbose=True,
   early_stopping_rounds = 6)

and here is what I get in the end:

[71]    validation_0-rmse:1.70071   validation_1-rmse:1.9382
[72]    validation_0-rmse:1.69806   validation_1-rmse:1.93825
[73]    validation_0-rmse:1.69732   validation_1-rmse:1.93803
Stopping. Best iteration:
[67]    validation_0-rmse:1.70768   validation_1-rmse:1.93734

and when I check the values of the validation I used :

y_pred_valid = reg.predict(X_valid)
y_pred_valid_df = pd.DataFrame(y_pred_valid)
sqrt(mse(y_valid, y_pred_valid_df[0]))

I get

1.9373418403889535

If the fit had return the last iteration instead of the best one it should have given an rmse around 1.93803 but it gave an rmse at 1.93734, exactly the best score.

I checked again by two ways: [Edit] I've edited the code below according to @Eran Moshe answer

y_pred_valid = reg.predict(X_valid, ntree_limit=reg.best_ntree_limit)
y_pred_valid_df = pd.DataFrame(y_pred_valid)
sqrt(mse(y_valid, y_pred_valid_df[0]))

1.9373418403889535

and even if I call the fit (knowing the best iter is the 67th) with only 68 estimators so that I'm sure the last one is the best one:

reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 68, max_depth= 5)

reg.fit(
   X_train, 
   y_train, 
   eval_metric='rmse',    
   eval_set=[(X_train, y_train), (X_valid, y_valid)],
   verbose=True,
   early_stopping_rounds = 6)

the result is the same:

1.9373418403889535

So that seems to lead to the idea that, unlike the documentation, and those numerous discussions about it, tell, the fit of xgboost, when stopped by the early stopping round parameter, does give the best iter, not the last one.

Am I wrong, if so, where, and how do you explain the behavior I met ?

Thanks for the attention

663

asked Nov 26 '18 14:11

Lyxthe Lyxos

2 Answers

I think, it is not wrong, but inconsistent.

The documentation of the predict method is correct (e.g. see here). To bee 100% sure it is better to look into the code: xgb github, so predict behaves as is stated in it's documentation, but the fit documentation is outdated. Please, post it as an issue on XGB github and either they will fix the docs or you will and will become an XGB contributer :)

199

answered Oct 21 '22 22:10

Mischa Lisovyi

You have a code error there.

Notice how

reg.predict(X_valid, ntree_limit=reg.best_ntree_limit)

Should be

y_pred_valid = reg.predict(X_valid, ntree_limit=reg.best_ntree_limit)

So in fact you're making the same comparison, when calculating

sqrt(mse(y_valid, y_pred_valid_df[0]))

Xgboost is working just as you've read. early_stopping_round = x will train until it didn't improve for x consecutive rounds.

And when predicting with ntree_limit=y it'll use ONLY the first y Boosters.

answered Oct 22 '22 00:10

Eran Moshe

Related questions
                            
                                Do conda environments have access to 'root' environment? (== system packages)?
                            
                                Prefix search against half a billion strings
                            
                                Python argparse: type inconsistencies when combining 'choices', 'nargs' and 'default'
                            
                                Is there a way to see the folds for cross-validation in GridSearchCV?
                            
                                Pandas - Rolling slope calculation
                            
                                Python: libpython3.5.dylib not found?
                            
                                How to remove horizontal and vertical lines from an image
                            
                                Detecting C types limits ("limits.h") in python?
                            
                                Heroku: No default language could be detected for this app for python even with runtime.txt
                            
                                PyCharm overwrites PYTHONPATH in a docker container being used as an interpreter
                            
                                How to do bulk instance deletion in Django Rest Framework?
                            
                                When using Python docx how to enable spelling in output document?
                            
                                Order of insertion in sets (when parsing {}) [duplicate]
                            
                                Learning Keras model by using Distributed Tensorflow
                            
                                How can I add type-annotations to dynamically created classes?
                            
                                Cython attemps to compile twice, and fails
                            
                                Reading the output of Pythons memory_profiler
                            
                                Docker, Flask, SQLAlchemy: ValueError: invalid literal for int() with base 10: 'None'
                            
                                Why is an __init__ skipped when doing Base.__init__(self) in multiple inheritance instead of super().__init__()?
                            
                                multiple simultaneous connections on same jupyter notebook at the same time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)

Tags:

python

machine-learning

scikit-learn

xgboost

Lyxthe Lyxos

People also ask

2 Answers

Mischa Lisovyi

Eran Moshe

Recent Activity

Donate For Us