Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn refit/partial fit option in Classifers

I am wondering is there any option in sklearn classifiers to fit using some hyperparameters and after changing a few hyperparameter(s), refit the model by saving computation (fit) cost.

Let us say, Logistic Regression is fit using C=1e5 (logreg=linear_model.LogisticRegression(C=1e5)) and we change only C to C=1e3. I want to save some computation because only one parameter is changed.

like image 698
Techie Fort Avatar asked Aug 11 '17 15:08

Techie Fort


People also ask

What is partial fit in Sklearn?

partial_fit is a handy API that can be used to perform incremental learning in a mini-batch of an out-of-memory dataset. The primary purpose of using warm_state is to reducing training time when fitting the same dataset with different sets of hyperparameter values.

What does the Fit () method of Sklearn do?

The 'fit' method trains the algorithm on the training data, after the model is initialized. That's really all it does. So the sklearn fit method uses the training data as an input to train the machine learning model.

What does fit () do in regression?

Use Fit Regression Model to describe the relationship between a set of predictors and a continuous response using the ordinary least squares method. You can include interaction and polynomial terms, perform stepwise regression, and transform skewed data.

What is Intercept_scaling?

intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight . Note! the synthetic feature weight is subject to l1/l2 regularization as all other features.


1 Answers

Yes, there is a technique called warm_start which, citing from the documentation, means:

warm_start : bool, default: False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver.

As described in the documentation here, it's available in LogisticRegression :

sklearn.linear_model.LogisticRegression(..., warm_start=False, n_jobs=1)

So concretely, for your case you would do the following:

from sklearn.linear_model import LogisticRegression 

# create an instance of LogisticRegression with warm_start=True
logreg = LogisticRegression(C=1e5, warm_start=True)
# you can access the C parameter's value as follows
logreg.C 
# it's set to 100000.0

# .... 
# train your model here by calling logreg.fit(..)
# ....

# reset the value of the C parameter as follows 
logreg.C = 1e3 

logreg.C 
# now it's set to 1000.0

# .... 
# re-train your model here by calling logreg.fit(..)
# ....

As far as I have been able to check quickly, it's available also in the following:

  • sklearn.ensemble.RandomForestClassifier
  • sklearn.ensemble.GradientBoostingClassifier
  • sklearn.linear_model.PassiveAggressiveClassifier
  • sklearn.ensemble.BaggingClassifier
  • sklearn.ensemble.ExtraTreesClassifier
like image 134
Mohamed Ali JAMAOUI Avatar answered Sep 21 '22 19:09

Mohamed Ali JAMAOUI