Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save a decision tree model in scikit

I'm building a decision tree using Scikit-Learn in Python. I've trained the model on a particular dataset and now I want to save this decision tree so that it can be used later (on a new dataset). Anyone knows how to do this?

like image 233
nEO Avatar asked Oct 01 '14 11:10

nEO


People also ask

Can you save a sklearn model?

Saving and loading Scikit-Learn models is part of the lifecycle of most models - typically, you'll train them in one runtime and serve them in another. With the model fit - let's go ahead and save it. Note: The data is scaled for the model to learn from.


2 Answers

As taken from the Model Persistence section of this tutorial:

It is possible to save a model in the scikit by using Python’s built-in persistence model, namely pickle:

>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0])
array([0])
>>> y[0]
0
like image 55
Matthew Spencer Avatar answered Oct 08 '22 21:10

Matthew Spencer


There is currently no reliable way of doing this. While pickling does work, it is not good enough, as your pickled data is not guaranteed to get properly unpickled with a later version of scikit-learn.

Quote from: http://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations

Models saved in one version of scikit-learn might not load in another version.

like image 5
Bastian Venthur Avatar answered Oct 08 '22 21:10

Bastian Venthur