Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save to disk an sklearn model with its out-of-file dependencies?

I want to save to disk an sklearn Pipeline including a custom Preprocessing and a RandomForestClassifier with all the dependencies inside the saved file.. Without this feature, I have to copy all the dependencies (custom modules) in the same folder everywhere I want to call this model (in my case on a remote server).

The preprocessor is defined in a class which lies in an other file (preprocessing.py) in the same folder of my project. So I get access to it through an import.

training.py

from preprocessing import Preprocessor

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
import pickle

clf = Pipeline([
("preprocessing", Preprocessor()),
("model", RandomForestClassifier())
])

# some fitting of the classifier
# ...

# Export
with open(savepath, "wb") as handle:
    pickle.dump(clf, handle, protocol=pickle.HIGHEST_PROTOCOL)

I tried pickle (and some of its variations), dill and joblib, but that did not work. When I import the .pkl somewhere else (say on my remote server). I must have an identical preprocessing.py in the architecture... which is a pain.

What I would love is to have another file somewhere else :
remote.py

import pickle

with open(savepath, "rb") as handle:
     model = pickle.load(handle)

print(model.predict(some_matrix))

But this code currently gives me an error as it does not find the Preprocessor class...

like image 436
maxJu Avatar asked Mar 28 '19 13:03

maxJu


People also ask

How do you save a Sklearn model?

In case your model contains large arrays of data, each array will be stored in a separate file, but the save and restore procedure will remain the same. we convert Python dictionary to a JSON string using JSON dumps. we need indented output so we provide indent parameter and set it to 4. Save the JSON string to a file.

How do you save a Sklearn model as PKL?

To save the model all we need to do is pass the model object into the dump() function of Pickle. This will serialize the object and convert it into a “byte stream” that we can save as a file called model. pkl .


1 Answers

I'm facing an identical issue right now. To address the same, I am going to turn my pipeline/model along with all it's dependencies(preprocessing classes) into a Python module using setup tools so that it is self contained and can be run anywhere (remote server/docker container/VM.

I'm currently going through this process and if this is something you are interested in, I can respond with the additional steps spelled out as I make progress.

like image 55
gdv820 Avatar answered Oct 16 '22 05:10

gdv820