Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?

Tags:

When I use xgboost to train my data for a 2-cates classification problem,I'd like to use the early stopping to get the best model, but I'm confused about which one to use in my predict as the early stop will return 3 different choices. For example, should I use

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

or should I use

preds = model.predict(xgtest, ntree_limit=bst.best_ntree_limit)

or both right, and they should be applied to different circumstances? If so, how can I judge which one to use?

Here is the original quotation of the xgboost document, but it didn't give the reason why and I also didn't find the comparison between those params:

Early Stopping

If you have a validation set, you can use early stopping to find the optimal number of boosting rounds. Early stopping requires at least one set in evals. If there's more than one, it will use the last.

train(..., evals=evals, early_stopping_rounds=10)

The model will train until the validation score stops improving. Validation error needs to decrease at least every early_stopping_rounds to continue training.

If early stopping occurs, the model will have three additional fields: bst.best_score, bst.best_iteration and bst.best_ntree_limit. Note that train() will return a model from the last iteration, not the best one. Pr ediction

A model that has been trained or loaded can perform predictions on data sets.
# 7 entities, each contains 10 features 
data = np.random.rand(7, 10) 
dtest = xgb.DMatrix(data) 
ypred = bst.predict(dtest)
If early stopping is enabled during training, you can get predictions from the best iteration with bst.best_ntree_limit:

ypred = bst.predict(dtest,ntree_limit=bst.best_ntree_limit)

Thanks in advance.

258

asked Apr 21 '17 04:04

LancelotHolmes

1 Answers

In my point of view, both parameters refer to the same think, or at least have the same goal. But I would rather use:

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

From the source code, we can see here that best_ntree_limit is going to be dropped in favor of best_iteration.

def _get_booster_layer_trees(model: "Booster") -> Tuple[int, int]:
    """Get number of trees added to booster per-iteration.  This function will be removed
    once `best_ntree_limit` is dropped in favor of `best_iteration`.  Returns
    `num_parallel_tree` and `num_groups`.
    """

Additionally, best_ntree_limit has been removed from EarlyStopping documentation page.

So I think this attribute exist only for backwards compatibility reasons. From this code snippet and the documentation, we can therefore assume that best_ntree_limit is or will be deprecated.

200

answered Nov 15 '22 23:11

Antoine Dubuis

Related questions
                            
                                Django - check if list contains something in a template
                            
                                Make a py2exe exe run without a console?
                            
                                Asynchronously redirect stdout/stdin from embedded python to c++?
                            
                                How to get centroids from SciPy's hierarchical agglomerative clustering?
                            
                                What is a real-world example of Dependency Injection in a Dynamic Language?
                            
                                Disabling Javascript after page has been rendered in Selenium Webdriver
                            
                                What is this (cid:51) in the output of pdf2txt?
                            
                                Is there any documentation of numpy numerical stability?
                            
                                PyCharm SSH tunneling via local ssh config (~/.ssh/config)
                            
                                Why is merging Python system classes with custom classes less desirable than hooking the import mechanism?
                            
                                Importing a Python package from a script with the same name
                            
                                Ordering and pagination in SQL-alchemy using non-sql ranking
                            
                                Python warnings- how to not print the source line? [duplicate]
                            
                                Prevent PyCharm from showing builtin modules on KeyboardInterrupt and other occasions
                            
                                Low InnoDB Writes per Second - AWS EC2 to MySQL RDS using Python
                            
                                How to distribute files in a Python sdist that are not VCS tracked?
                            
                                Is it possible to prioritise a lock?
                            
                                Unpredictable pandas slice assignment behavior with no SettingWithCopyWarning
                            
                                Executable made with pyInstaller/UPX experiences QtCore4.dll error
                            
                                How to denote return type tuple in Google-style Pydoc for Pycharm?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?

Tags:

python

machine-learning

xgboost

LancelotHolmes

People also ask

1 Answers

Antoine Dubuis

Recent Activity

Donate For Us