Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save a randomforest in scikit-learn?

Actually there is a lot of question about persistence,but i have tried a lot using pickle or joblib.dumps . but when i use it to save my random forest i got this:

ValueError: ("Buffer dtype mismatch, expected 'SIZE_t' but got 'long'", <type 'sklearn.tree._tree.ClassificationCriterion'>, (1, array([10])))

Can any one tell me why?

some code for review

forest = RandomForestClassifier()
forest.fit(data[:n_samples], target[:n_samples ])
import cPickle
with open('rf.pkl', 'wb') as f:
    cPickle.dump(forest, f)
with open('rf.pkl', 'rb') as f:
    forest = cPickle.load(f)

or

from sklearn.externals import joblib
joblib.dump(forest,'rf.pkl') 

from sklearn.externals import joblib
forest = joblib.load('rf.pkl')
like image 420
mrbean Avatar asked Dec 22 '14 02:12

mrbean


People also ask

How do you use Randomforest?

Step 1: The algorithm select random samples from the dataset provided. Step 2: The algorithm will create a decision tree for each sample selected. Then it will get a prediction result from each decision tree created. Step 3: Voting will then be performed for every predicted result.


2 Answers

It is caused by using different 32/64 bit version of python to save/load, as Scikits-Learn RandomForrest trained on 64bit python wont open on 32bit python suggests.

like image 138
xgdgsc Avatar answered Sep 22 '22 04:09

xgdgsc


Try to import the joblib package directly:

import joblib

# ...

# save
joblib.dump(rf, "some_path")

# load 
rf2 = joblib.load("some_path")

I've put the full working example with the code and comments here.

like image 40
pplonski Avatar answered Sep 22 '22 04:09

pplonski