How to clone an scikit-learn estimator including its data?

Question

I am attempting to perform a partial fit of on an naive-bayes estimator but also retain a copy of the estimator prior to the partial fit. sklearn.base.clone only clones an estimators parameters, not it's data, so is not useful in this case. Performing a partial fit on the clone only uses the data added during the partial fit, since the clone is effectively empty.

from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
fit_model = model.fit(np.array(X),np.array(y))
fit_model2 = model.partial_fit = (np.array(Z),np.array(w)),np.unique(y))

In the above example fit_model and fit_model2 will be the same since they both point to the same object. I would like to retain the original copy unaltered. My workaround is to pickle the original and load it into a new object to perform a partial fit on. Like this:

model = MultinomialNB()
fit_model = model.fit(np.array(X),np.array(y))

import pickle
with open('saved_model', 'wb') as f:
    pickle.dump([model], f)

with open('saved_model', 'rb') as f:
    [model2] = pickle.load(f) 

fit_model2 = model2.partial_fit(np.array(Z),np.array(w)),np.unique(y))

Also I can completely refit with the new data each time, but since I need to perform this thousands of times I'm trying to find something more efficient.

yprez · Accepted Answer

model.fit() returns the model itself (the same object). So you don't have to assign it to a different variable as it's just aliasing.
You can use deepcopy to copy the object in a similar way to what loading a pickled object does.

So if you do something like:

from copy import deepcopy

model = MultinomialNB()
model.fit(np.array(X), np.array(y))

model2 = deepcopy(model)

model2.partial_fit(np.array(Z),np.array(w)), np.unique(y))
# ...

model2 will be a distinct object, with the copied parameters of model, including the "trained" parameters.

Anuj Gupta · Answer

from copy import deepcopy

model = MultinomialNB()
model.fit(np.array(X), np.array(y))

model2 = deepcopy(model)

weight_vector_model = array(model.coef_[0])
weight_vector_model2 = array(model2.coef_[0])

model2.partial_fit(np.array(Z),np.array(w)), np.unique(y))

weight_vector_model = array(model.coef_[0])
weight_vector_model2 = array(model2.coef_[0])

model and model2 are now completely different objects. partial_fit() on model2 will have no impact on model. The two weight vectors are same after deepcopy but differ after partial_fit() on model2

How to clone an scikit-learn estimator including its data?

Tags:

python

python-3.x

scikit-learn

N.Harrison

2 Answers

yprez

Anuj Gupta

Recent Activity

Donate For Us

How to clone an scikit-learn estimator including its data?

Tags:

python

python-3.x

scikit-learn

N.Harrison

2 Answers

yprez

Anuj Gupta

Related questions

Recent Activity

Donate For Us