Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the CV stand for in sklearn.linear_model.LogisticRegressionCV?

Tags:

scikit-learn has two logistic regression functions:

  • sklearn.linear_model.LogisticRegression
  • sklearn.linear_model.LogisticRegressionCV

I'm just curious what the CV stands for in the second one. The only acronym I know in ML that matches "CV" is cross-validation, but I'm guessing that's not it, since that would be achieved in scikit-learn with a wrapper function, not as part of the logistic regression function itself (I think).

like image 844
Stephen Avatar asked Sep 30 '17 22:09

Stephen


People also ask

What is CV in logistic regression?

cvint or cross-validation generator, default=None. The default cross-validation generator used is Stratified K-Folds. If an integer is provided, then it is the number of folds used. See the module sklearn. model_selection module for the list of possible cross-validation objects.

What is CS in LogisticRegressionCV?

scikit-learn's LogisticRegressionCV method includes a parameter Cs . If supplied a list, Cs is the candidate hyperparameter values to select from. If supplied a integer, Cs a list of that many candidate values will is drawn from a logarithmic scale between 0.0001 and and 10000 (a range of reasonable values for C).

What is Sklearn Linear_model?

linear_model is a class of the sklearn module if contain different functions for performing machine learning with linear models. The term linear model implies that the model is specified as a linear combination of features.

What is Newton CG solver?

newton-cg: Solver which calculates Hessian explicitly which can be computationally expensive in high dimensions. sag: Stands for Stochastic Average Gradient Descent. More efficient solver with large datasets. saga: Saga is a variant of Sag and it can be used with l1 Regularization.


2 Answers

You are right in guessing that the latter allows the user to perform cross validation. The user can pass the number of folds as an argument cv of the function to perform k-fold cross-validation (default is 10 folds with StratifiedKFold).

I would recommend reading the documentation for the functions LogisticRegression and LogisticRegressionCV

like image 52
Sanyam Mehra Avatar answered Oct 22 '22 02:10

Sanyam Mehra


Yes, it's cross-validation. Excerpt from the docs:

For the grid of Cs values (that are set by default to be ten values in a logarithmic scale between 1e-4 and 1e4), the best hyperparameter is selected by the cross-validator StratifiedKFold, but it can be changed using the cv parameter.

The point here is the following:

  • yes: sklearn has general model-selection wrappers providing CV-functionality for all those classifiers/regressors
  • but: when the classifier/regressor is known/fixed a-priori (to some extent) or sometimes even some CV-model, one can gain advantages using these facts with specialized code bound to one classifier/regressor resulting in improved performance!
    • Typically:
      • CV already embedded in optimization-algorithm
      • Efficient warm-starting (instead of full re-optimization after just the change of one parameter like alpha)

It seems, at least the latter idea is used in sklearn's LogisticRegressionCV, as seen in this excerpt:

In the case of newton-cg and lbfgs solvers, we warm start along the path i.e guess the initial coefficients of the present fit to be the coefficients got after convergence in the previous fit, so it is supposed to be faster for high-dimensional dense data.

like image 24
sascha Avatar answered Oct 22 '22 03:10

sascha