Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to clone an scikit-learn estimator including its data?

I am attempting to perform a partial fit of on an naive-bayes estimator but also retain a copy of the estimator prior to the partial fit. sklearn.base.clone only clones an estimators parameters, not it's data, so is not useful in this case. Performing a partial fit on the clone only uses the data added during the partial fit, since the clone is effectively empty.

from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
fit_model = model.fit(np.array(X),np.array(y))
fit_model2 = model.partial_fit = (np.array(Z),np.array(w)),np.unique(y))

In the above example fit_model and fit_model2 will be the same since they both point to the same object. I would like to retain the original copy unaltered. My workaround is to pickle the original and load it into a new object to perform a partial fit on. Like this:

model = MultinomialNB()
fit_model = model.fit(np.array(X),np.array(y))

import pickle
with open('saved_model', 'wb') as f:
    pickle.dump([model], f)

with open('saved_model', 'rb') as f:
    [model2] = pickle.load(f) 

fit_model2 = model2.partial_fit(np.array(Z),np.array(w)),np.unique(y))

Also I can completely refit with the new data each time, but since I need to perform this thousands of times I'm trying to find something more efficient.

like image 750
N.Harrison Avatar asked Nov 06 '15 21:11

N.Harrison


2 Answers

  1. model.fit() returns the model itself (the same object). So you don't have to assign it to a different variable as it's just aliasing.

  2. You can use deepcopy to copy the object in a similar way to what loading a pickled object does.

So if you do something like:

from copy import deepcopy

model = MultinomialNB()
model.fit(np.array(X), np.array(y))

model2 = deepcopy(model)

model2.partial_fit(np.array(Z),np.array(w)), np.unique(y))
# ...

model2 will be a distinct object, with the copied parameters of model, including the "trained" parameters.

like image 117
yprez Avatar answered Oct 28 '22 21:10

yprez


from copy import deepcopy

model = MultinomialNB()
model.fit(np.array(X), np.array(y))

model2 = deepcopy(model)

weight_vector_model = array(model.coef_[0])
weight_vector_model2 = array(model2.coef_[0])

model2.partial_fit(np.array(Z),np.array(w)), np.unique(y))

weight_vector_model = array(model.coef_[0])
weight_vector_model2 = array(model2.coef_[0])

model and model2 are now completely different objects. partial_fit() on model2 will have no impact on model. The two weight vectors are same after deepcopy but differ after partial_fit() on model2

like image 41
Anuj Gupta Avatar answered Oct 28 '22 22:10

Anuj Gupta