From the XGBoost guide: <blockquote> After training, the model can be saved. <pre class="prettyprint"><code>bst.save_model('0001.model') </code></pre> The model and its feature map can also be dumped to a text file. <pre class="prettyprint"><code># dump model bst.dump_model('dump.raw.txt') # dump model with feature map bst.dump_model('dump.raw.txt', 'featmap.txt') </code></pre> A saved model can be loaded as follows: <pre class="prettyprint"><code>bst = xgb.Booster({'nthread': 4}) # init model bst.load_model('model.bin') # load data </code></pre> </blockquote> My questions are following. <ol> <li>What's the difference between <code>save_model</code> & <code>dump_model</code>?</li> <li>What's the difference between saving <code>'0001.model'</code> and <code>'dump.raw.txt','featmap.txt'</code>?</li> <li>Why the model name for loading <code>model.bin</code> is different from the name to be saved <code>0001.model</code>?</li> <li>Suppose that I trained two models: <code>model_A</code> and <code>model_B</code>. I wanted to save both models for future use. Which <code>save</code> & <code>load</code> function should I use? Could you help show the clear process?</li> </ol>

Both functions <code>save_model</code> and <code>dump_model</code> save the model, the difference is that in <code>dump_model</code> you can save feature name and save tree in text format. The <code>load_model</code> will work with model from <code>save_model</code>. The model from <code>dump_model</code> can be used for example with xgbfi. During loading the model, you need to specify the path where your models is saved. In the example <code>bst.load_model("model.bin")</code> model is loaded from file <code>model.bin</code> - it is just a name of file with model. Good luck! EDIT: From Xgboost documentation (for version <code>1.3.3</code>), the <code>dump_model()</code> should be used for saving the model for further interpretation. For saving and loading the model the <code>save_model()</code> and <code>load_model()</code> should be used. Please check the docs for more details. There is also a difference between <code>Learning API</code> and <code>Scikit-Learn API</code> of Xgboost. The latter saves the <code>best_ntree_limit</code> variable which is set during the training with early stopping. You can read details in my article How to save and load Xgboost in Python? The <code>save_model()</code> method recognize the format of the file name, if <code>*.json</code> is specified, then model is saved in JSON, otherwise it is text file.

How to save & load xgboost model? [closed]

Tags:

python

save

machine-learning

xgboost

From the XGBoost guide:

After training, the model can be saved.
bst.save_model('0001.model')
The model and its feature map can also be dumped to a text file.
# dump model
bst.dump_model('dump.raw.txt')
# dump model with feature map
bst.dump_model('dump.raw.txt', 'featmap.txt')
A saved model can be loaded as follows:
bst = xgb.Booster({'nthread': 4})  # init model
bst.load_model('model.bin')  # load data

My questions are following.

What's the difference between save_model & dump_model?
What's the difference between saving '0001.model' and 'dump.raw.txt','featmap.txt'?
Why the model name for loading model.bin is different from the name to be saved 0001.model?
Suppose that I trained two models: model_A and model_B. I wanted to save both models for future use. Which save & load function should I use? Could you help show the clear process?

662

asked Apr 29 '17 03:04

Pengju Zhao

2 Answers

Here is how I solved the problem:

import pickle
file_name = "xgb_reg.pkl"

# save
pickle.dump(xgb_model, open(file_name, "wb"))

# load
xgb_model_loaded = pickle.load(open(file_name, "rb"))

# test
ind = 1
test = X_val[ind]
xgb_model_loaded.predict(test)[0] == xgb_model.predict(test)[0]

Out[1]: True

186

answered Oct 17 '22 07:10

ChrisDanger

Both functions save_model and dump_model save the model, the difference is that in dump_model you can save feature name and save tree in text format.

The load_model will work with model from save_model. The model from dump_model can be used for example with xgbfi.

During loading the model, you need to specify the path where your models is saved. In the example bst.load_model("model.bin") model is loaded from file model.bin - it is just a name of file with model. Good luck!

EDIT: From Xgboost documentation (for version 1.3.3), the dump_model() should be used for saving the model for further interpretation. For saving and loading the model the save_model() and load_model() should be used. Please check the docs for more details.

There is also a difference between Learning API and Scikit-Learn API of Xgboost. The latter saves the best_ntree_limit variable which is set during the training with early stopping. You can read details in my article How to save and load Xgboost in Python?

The save_model() method recognize the format of the file name, if *.json is specified, then model is saved in JSON, otherwise it is text file.

answered Oct 17 '22 06:10

pplonski

Related questions
                            
                                Pandas - convert strings to time without date
                            
                                How to handle the pylint message: Warning: Method could be a function
                            
                                How to display progress of scipy.optimize function?
                            
                                Clearly documented reading of emails functionality with python win32com outlook
                            
                                String with 'f' prefix in python-3.6
                            
                                System-wide mutex in Python on Linux
                            
                                TypeError: unhashable type: 'dict', when dict used as a key for another dict [duplicate]
                            
                                Make Javascript do List Comprehension
                            
                                Python OpenCV - imshow doesn't need convert from BGR to RGB
                            
                                Most efficient way to forward-fill NaN values in numpy array
                            
                                What are the use cases of Node.js vs Twisted?
                            
                                "'generator' object is not subscriptable" error
                            
                                How to force deletion of a python object?
                            
                                Python 3 - Can pickle handle byte objects larger than 4GB?
                            
                                Getting values from functions that run as asyncio tasks
                            
                                R function rep() in Python (replicates elements of a list/vector)
                            
                                Passing command line arguments to argv in jupyter/ipython notebook
                            
                                "py.test" vs "pytest" command
                            
                                Passing file as argument to Docker container
                            
                                What numbers can you pass as verbosity in running Python Unit Test Suites?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With