I'm building a decision tree using Scikit-Learn in Python. I've trained the model on a particular dataset and now I want to save this decision tree so that it can be used later (on a new dataset). Anyone knows how to do this?
Saving and loading Scikit-Learn models is part of the lifecycle of most models - typically, you'll train them in one runtime and serve them in another. With the model fit - let's go ahead and save it. Note: The data is scaled for the model to learn from.
As taken from the Model Persistence section of this tutorial:
It is possible to save a model in the scikit by using Python’s built-in persistence model, namely pickle:
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0])
array([0])
>>> y[0]
0
There is currently no reliable way of doing this. While pickling does work, it is not good enough, as your pickled data is not guaranteed to get properly unpickled with a later version of scikit-learn.
Quote from: http://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
Models saved in one version of scikit-learn might not load in another version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With