Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the inverse of regularization strength in Logistic Regression? How should it affect my code?

I am using sklearn.linear_model.LogisticRegression in scikit learn to run a Logistic Regression.

C : float, optional (default=1.0) Inverse of regularization strength;     must be a positive float. Like in support vector machines, smaller     values specify stronger regularization. 

What does C mean here in simple terms please? What is regularization strength?

like image 315
user3427495 Avatar asked Apr 04 '14 00:04

user3427495


People also ask

What is inverse of regularization strength?

LogisticRegression uses the inverse of regularization strength as the regularization parameter, so C=λ−1. In a different package, you might set λ=10, but for this class, you would get an equivalent result with C=0.1.

How does regularization affect logistic regression?

The two lower line plots show the coefficients of logistic regression without regularization and all coefficients in comparison with each other. The plots show that regularization leads to smaller coefficient values, as we would expect, bearing in mind that regularization penalizes high coefficients.

What happens if regularization parameter is high?

When regularization parameter is too high and the learning rate too low, cost increases. I suspect that the added cost (associated with regularization) to loss function is responsible for this.

What are the issues if the value of the regularization is too small or too large?

If we introduce too much regularization, we can underfit the training set and have worse performance on the training set. Adding many new features gives us more expressive models which are able to better fit our training set. If too many new features are added, this can lead to overfitting of the training set.


1 Answers

Regularization is applying a penalty to increasing the magnitude of parameter values in order to reduce overfitting. When you train a model such as a logistic regression model, you are choosing parameters that give you the best fit to the data. This means minimizing the error between what the model predicts for your dependent variable given your data compared to what your dependent variable actually is.

The problem comes when you have a lot of parameters (a lot of independent variables) but not too much data. In this case, the model will often tailor the parameter values to idiosyncrasies in your data -- which means it fits your data almost perfectly. However because those idiosyncrasies don't appear in future data you see, your model predicts poorly.

To solve this, as well as minimizing the error as already discussed, you add to what is minimized and also minimize a function that penalizes large values of the parameters. Most often the function is λΣθj2, which is some constant λ times the sum of the squared parameter values θj2. The larger λ is the less likely it is that the parameters will be increased in magnitude simply to adjust for small perturbations in the data. In your case however, rather than specifying λ, you specify C=1/λ.

like image 117
TooTone Avatar answered Oct 14 '22 17:10

TooTone