Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does calling fit() multiple times on the same model do?

After I instantiate a scikit model (e.g. LinearRegression), if I call its fit() method multiple times (with different X and y data), what happens? Does it fit the model on the data like if I just re-instantiated the model (i.e. from scratch), or does it keep into accounts data already fitted from the previous call to fit()?

Trying with LinearRegression (also looking at its source code) it seems to me that every time I call fit(), it fits from scratch, ignoring the result of any previous call to the same method. I wonder if this true in general, and I can rely on this behavior for all models/pipelines of scikit learn.

like image 563
Fanta Avatar asked Apr 15 '18 11:04

Fanta


People also ask

What does the FIT method in Scikit learn do?

The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.

What happens when you fit a model?

Model fitting is the measure of how well a machine learning model generalizes data similar to that with which it was trained. A good model fit refers to a model that accurately approximates the output when it is provided with unseen inputs. Fitting refers to adjusting the parameters in the model to improve accuracy.

What is Partial_fit?

partial_fit is a handy API that can be used to perform incremental learning in a mini-batch of an out-of-memory dataset. The primary purpose of using warm_state is to reducing training time when fitting the same dataset with different sets of hyperparameter values.


2 Answers

If you will execute model.fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc.

If you want to fit just a portion of your data set and then to improve your model by fitting a new data, then you can use estimators, supporting "Incremental learning" (those, that implement partial_fit() method)

like image 165
MaxU - stop WAR against UA Avatar answered Oct 08 '22 00:10

MaxU - stop WAR against UA


You can use term fit() and train() word interchangeably in machine learning. Based on classification model you have instantiated, may be a clf = GBNaiveBayes() or clf = SVC(), your model uses specified machine learning technique.
And as soon as you call clf.fit(features_train, label_train) your model starts training using the features and labels that you have passed.

you can use clf.predict(features_test) to predict.
If you will again call clf.fit(features_train2, label_train2) it will start training again using passed data and will remove the previous results. Your model will reset the following inside model:

  • Weights
  • Fitted Coefficients
  • Bias
  • And other training related stuff...

You can use partial_fit() method as well if you want your previous calculated stuff to stay and additionally train using next data

like image 23
sgrpwr Avatar answered Oct 08 '22 02:10

sgrpwr