I've followed this guide to save a machine learning model for later use. The model was dumped in one machine:
from sklearn.externals import joblib
joblib.dump(clf, 'model.pkl')
And when I loaded it joblib.load('model.pkl')
in another machine, I got this warning:
UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version pre-0.18 when using version 0.18.1. This might lead to breaking code or invalid results. Use at your own risk.
So is there any way to know the sklearn version of the saved model to compare it with the current version?
How do I check my sklearn version? Use sklearn. __version__ to display the installed version of scikit-learn.
Joblib has an Apache Spark extension: joblib-spark. Scikit-learn can use this extension to train estimators in parallel on all the workers of your spark cluster without significantly changing your code. Note that, this requires scikit-learn>=0.21 and pyspark>=2.4.
TLDR: joblib is faster in saving/loading large NumPy arrays, whereas pickle is faster with large collections of Python objects. Therefore, if your model contains large NumPy arrays (as the majority of models does), joblib should be faster.
This function can load numpy array files saved separately during the dump. If the mmap_mode argument is given, it is passed to np. load and arrays are loaded as memmaps. As a consequence, the reconstructed object might not match the original pickled object.
Versioning of pickled estimators was added in scikit-learn 0.18. Starting from v0.18, you can get the version of scikit-learn used to create the estimator with,
estimator.__getstate__()['_sklearn_version']
The warning you get is produced by the __setstate__
method of the estimator which is automatically called upon unpickling. It doesn't look like there is a straightforward way of getting this version without loading the estimator from disk. You can filter out the warning, with,
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
estimator = joblib.load('model.pkl')
For pre-0.18 versions, there is no such mechanism, but I imagine you could, for instance, use not hasattr(estimator, '__getstate')
as a test to detect to, at least, pre-0.18
versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With