Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn Fit model multiple times

Origin of the problem is common:

presence of a lot of train data, which was read in chunks. Point of interest is to fit sequentially the desired model on chunked data sets, keeping states of previous fitting.

Are there any methods except partial_fit() to fit model using sklearn on different data? or is there any tricks to rewrite code of fit() function to customize it for this problem? or is it possible somekow realize with pickle?

like image 337
Marcel Mars Avatar asked Aug 11 '16 11:08

Marcel Mars


People also ask

Can I call model fit multiple times?

In fact you can call fit many times instead of setting epochs and will work mostly the same.

What happens if you fit a model twice?

If you will execute model. fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc. Some estimators (having the warm_start parameter) will reuse the solutions from the previous calls to fit() as initial solution in new call when warm_start = True .

What does fit () do in Sklearn?

The 'fit' method trains the algorithm on the training data, after the model is initialized. That's really all it does. So the sklearn fit method uses the training data as an input to train the machine learning model.

What is partial fit in Sklearn?

partial_fit is a handy API that can be used to perform incremental learning in a mini-batch of an out-of-memory dataset. The primary purpose of using warm_state is to reducing training time when fitting the same dataset with different sets of hyperparameter values.


1 Answers

There is a reason why some models expose partial_fit() and others don't. Every model is a different machine learning algorithm and for many of these algorithms there is just no way to add an element without recalculating the model from scratch.

So, if you have to fit the models incrementally, pick an incremental model that has partial_fit(). You can find a full list on this documentation page.

Alternatively, you can build an ensemble model. Create a separate Classifier() or Regression() for every chunk of data you have. Then, when you need to predict something, you can just

for classifier in classifiers:
  votes[classifier.predict(X)] += 1
prediction = numpy.argmax(votes)

or, for regressors

prediction = numpy.mean([regressor.predict(X) for regressor in regressors] 
like image 56
0x60 Avatar answered Oct 05 '22 22:10

0x60