The scikit-learn package provides the functions Lasso()
and LassoCV()
but no option to fit a logistic function instead of a linear one...How to perform logistic lasso in python?
We can use LASSO to improve overfitting in models by selecting features. It works with Linear Regression, Logistic Regression and several other models. Essentially, if the model has coefficients, LASSO can be used.
In Python, Lasso regression can be performed using the Lasso class from the sklearn. linear_model library. The Lasso class takes in a parameter called alpha which represents the strength of the regularization term. A higher alpha value results in a stronger penalty, and therefore fewer features being used in the model.
LASSO is a penalized regression approach that estimates the regression coefficients by maximizing the log-likelihood function (or the sum of squared residuals) with the constraint that the sum of the absolute values of the regression coefficients, ∑ j = 1 k β j , is less than or equal to a positive constant s.
The Lasso optimizes a least-square problem with a L1 penalty. By definition you can't optimize a logistic function with the Lasso.
If you want to optimize a logistic function with a L1 penalty, you can use the LogisticRegression
estimator with the L1 penalty:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
log = LogisticRegression(penalty='l1', solver='liblinear')
log.fit(X, y)
Note that only the LIBLINEAR and SAGA (added in v0.19) solvers handle the L1 penalty.
sklearn.linear_model.LogisticRegression
sklearn.linear_model.LogisticRegression
from scikit-learn is probably the best:
as @TomDLT said, Lasso
is for the least squares (regression) case, not logistic (classification).
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(
penalty='l1',
solver='saga', # or 'liblinear'
C=regularization_strength)
model.fit(x, y)
glmnet.LogitNet
You can also use Civis Analytics' python-glmnet library. This implements the scikit-learn BaseEstimator
API:
# source: https://github.com/civisanalytics/python-glmnet#regularized-logistic-regression
from glmnet import LogitNet
m = LogitNet(
alpha=1, # 0 <= alpha <= 1, 0 for ridge, 1 for lasso
)
m = m.fit(x, y)
I'm not sure how to adjust the penalty with LogitNet
, but I'll let you figure that out.
you can also take a fully bayesian approach. rather than use L1-penalized optimization to find a point estimate for your coefficients, you can approximate the distribution of your coefficients given your data. this gives you the same answer as L1-penalized maximum likelihood estimation if you use a Laplace prior for your coefficients. the Laplace prior induces sparsity.
the PyMC folks have a tutorial here on setting something like that up. good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With