Actually there is a lot of question about persistence,but i have tried a lot using pickle
or joblib.dumps
. but when i use it to save my random forest i got this:
ValueError: ("Buffer dtype mismatch, expected 'SIZE_t' but got 'long'", <type 'sklearn.tree._tree.ClassificationCriterion'>, (1, array([10])))
Can any one tell me why?
some code for review
forest = RandomForestClassifier()
forest.fit(data[:n_samples], target[:n_samples ])
import cPickle
with open('rf.pkl', 'wb') as f:
cPickle.dump(forest, f)
with open('rf.pkl', 'rb') as f:
forest = cPickle.load(f)
or
from sklearn.externals import joblib
joblib.dump(forest,'rf.pkl')
from sklearn.externals import joblib
forest = joblib.load('rf.pkl')
Step 1: The algorithm select random samples from the dataset provided. Step 2: The algorithm will create a decision tree for each sample selected. Then it will get a prediction result from each decision tree created. Step 3: Voting will then be performed for every predicted result.
It is caused by using different 32/64 bit version of python to save/load, as Scikits-Learn RandomForrest trained on 64bit python wont open on 32bit python suggests.
Try to import the joblib
package directly:
import joblib
# ...
# save
joblib.dump(rf, "some_path")
# load
rf2 = joblib.load("some_path")
I've put the full working example with the code and comments here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With