I'm looking for a good implementation for logistic regression (not regularized) in Python. I'm looking for a package that can also get weights for each vector. Can anyone suggest a good implementation / package? Thanks!
Weighted logistic regression is used when you have an imbalanced dataset. Let's understand with an example. Let's assume you have a dataset with patient details and you need to predict whether patient has cancer or not. Such datasets are generally imbalanced.
To specify weights we will make use of class_weight hyperparameter of Logistic-regression. The class_weight hyperparameter is a dictionary that defines weight of each label. Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have same weight value.
Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression that predicts a continuous outcome. In the simplest case there are two outcomes, which is called binomial, an example of which is predicting if a tumor is malignant or benign.
Logistic Regression in Python With StatsModels: ExampleYou can also implement logistic regression in Python with the StatsModels package. Typically, you want this when you need more statistical details related to models and results.
I notice that this question is quite old now but hopefully this can help someone. With sklearn, you can use the SGDClassifier class to create a logistic regression model by simply passing in 'log' as the loss:
sklearn.linear_model.SGDClassifier(loss='log', ...).
This class implements weighted samples in the fit() function:
classifier.fit(X, Y, sample_weight=weights)
where weights is a an array containing the sample weights that must be (obviously) the same length as the number of data points in X.
See http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.SGDClassifier.html for full documentation.
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight='balanced')
model = model.fit(X, y)
EDIT
Sample Weights can be added in the fit method. You just have to pass an array of n_samples. Check out documentation -
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
Hope this does it...
I think what you want is statsmodels. It has great support for GLM and other linear methods. If you're coming from R, you'll find the syntax very familiar.
statsmodels weighted regression
getting started w/ statsmodels
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With