Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check sklearn version before loading model using joblib

I've followed this guide to save a machine learning model for later use. The model was dumped in one machine:

from sklearn.externals import joblib
joblib.dump(clf, 'model.pkl')

And when I loaded it joblib.load('model.pkl') in another machine, I got this warning:

UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version pre-0.18 when using version 0.18.1. This might lead to breaking code or invalid results. Use at your own risk.

So is there any way to know the sklearn version of the saved model to compare it with the current version?

like image 334
Canh Avatar asked Dec 14 '16 15:12

Canh


People also ask

How do I know my Sklearn version?

How do I check my sklearn version? Use sklearn. __version__ to display the installed version of scikit-learn.

What is Joblib Sklearn?

Joblib has an Apache Spark extension: joblib-spark. Scikit-learn can use this extension to train estimators in parallel on all the workers of your spark cluster without significantly changing your code. Note that, this requires scikit-learn>=0.21 and pyspark>=2.4.

Which is better pickle or Joblib?

TLDR: joblib is faster in saving/loading large NumPy arrays, whereas pickle is faster with large collections of Python objects. Therefore, if your model contains large NumPy arrays (as the majority of models does), joblib should be faster.

What does Joblib load do?

This function can load numpy array files saved separately during the dump. If the mmap_mode argument is given, it is passed to np. load and arrays are loaded as memmaps. As a consequence, the reconstructed object might not match the original pickled object.


1 Answers

Versioning of pickled estimators was added in scikit-learn 0.18. Starting from v0.18, you can get the version of scikit-learn used to create the estimator with,

estimator.__getstate__()['_sklearn_version']

The warning you get is produced by the __setstate__ method of the estimator which is automatically called upon unpickling. It doesn't look like there is a straightforward way of getting this version without loading the estimator from disk. You can filter out the warning, with,

import warnings

with warnings.catch_warnings():
      warnings.simplefilter("ignore", category=UserWarning)
      estimator = joblib.load('model.pkl')

For pre-0.18 versions, there is no such mechanism, but I imagine you could, for instance, use not hasattr(estimator, '__getstate') as a test to detect to, at least, pre-0.18 versions.

like image 116
rth Avatar answered Oct 02 '22 09:10

rth