Origin of the problem is common:
presence of a lot of train data, which was read in chunks. Point of interest is to fit sequentially the desired model on chunked data sets, keeping states of previous fitting.
Are there any methods except partial_fit()
to fit model using sklearn on different data? or is there any tricks to rewrite code of fit()
function to customize it for this problem? or is it possible somekow realize with pickle
?
In fact you can call fit many times instead of setting epochs and will work mostly the same.
If you will execute model. fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc. Some estimators (having the warm_start parameter) will reuse the solutions from the previous calls to fit() as initial solution in new call when warm_start = True .
The 'fit' method trains the algorithm on the training data, after the model is initialized. That's really all it does. So the sklearn fit method uses the training data as an input to train the machine learning model.
partial_fit is a handy API that can be used to perform incremental learning in a mini-batch of an out-of-memory dataset. The primary purpose of using warm_state is to reducing training time when fitting the same dataset with different sets of hyperparameter values.
There is a reason why some models expose partial_fit()
and others don't. Every model is a different machine learning algorithm and for many of these algorithms there is just no way to add an element without recalculating the model from scratch.
So, if you have to fit the models incrementally, pick an incremental model that has partial_fit()
. You can find a full list on this documentation page.
Alternatively, you can build an ensemble model. Create a separate Classifier()
or Regression()
for every chunk of data you have. Then, when you need to predict something, you can just
for classifier in classifiers:
votes[classifier.predict(X)] += 1
prediction = numpy.argmax(votes)
or, for regressors
prediction = numpy.mean([regressor.predict(X) for regressor in regressors]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With