Not sure if this is a great place for this question, but I was told CrossValidated was not. So, all these questions refer to sklearn, but if you have insights into logistic regression in general, I'd love to hear them as well.
1) Does data have to be standardizes(mean 0, stdev 1)?
2) In sklearn, how do I specify what kind of regularization I want (L1 vs L2)? Note that this is different from penalty; penalty refers to classification error, not pentalty on coefficients.
3) How can I use to also do variable selection? I.e., analogously to lasso for linear regression.
4) When using regularization, how do I optimize for C, the regularization strength? Is there something built-in, or do I have to take care of this myself?
Probably an example would be most helpful, but I'd appreciate any insights on any of these questions.
This has been my starting point: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Thank you very much in advance!
Photo Credit: Scikit-Learn. Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
max_iterint, default=100. Maximum number of iterations taken for the solvers to converge.
1) For logistic regression, no. You are not computing distances between instances.
2) You can specify the penalty='l1'
or penalty='l2'
parameter. See the LogisticRegression page. L2 penalty is default.
3) There are various explicit feature selection techniques that scikit-learn provides, e.g. using SelectKBest with a chi2 ranking function.
4) You will want to do a Grid Search for the optimal parameter.
For more detail on all these questions, I suggest going through some of the Examples, e.g. this one and this one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With