What is the meaning of C
parameter in sklearn.linear_model.LogisticRegression
? How does it affect the decision boundary? Do high values of C
make the decision boundary non-linear? How does overfitting look like for logistic regression if we visualize the decision boundary?
From the documentation:
C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
If you don't understand that, Cross Validated may be a better place to ask than here.
While CS people will often refer to all the arguments to a function as "parameters", in machine learning, C is referred to as a "hyperparameter". The parameters are numbers that tells the model what to do with the features, while hyperparameters tell the model how to choose parameters.
Regularization generally refers the concept that there should be a complexity penalty for more extreme parameters. The idea is that just looking at the training data and not paying attention to how extreme one's parameters are leads to overfitting. A high value of C tells the model to give high weight to the training data, and a lower weight to the complexity penalty. A low value tells the model to give more weight to this complexity penalty at the expense of fitting to the training data. Basically, a high C means "Trust this training data a lot", while a low value says "This data may not be fully representative of the real world data, so if it's telling you to make a parameter really large, don't listen to it".
https://en.wikipedia.org/wiki/Regularization_(mathematics)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With