I'm looking for a good implementation for logistic regression (not regularized) in Python. I'm looking for a package that can also get weights for each vector. Can anyone suggest a good implementation / package? Thanks!
Weighted logistic regression is used when you have an imbalanced dataset. Let's understand with an example. Let's assume you have a dataset with patient details and you need to predict whether patient has cancer or not. Such datasets are generally imbalanced.
To specify weights we will make use of class_weight hyperparameter of Logistic-regression. The class_weight hyperparameter is a dictionary that defines weight of each label. Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have same weight value.
Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression that predicts a continuous outcome. In the simplest case there are two outcomes, which is called binomial, an example of which is predicting if a tumor is malignant or benign.
Logistic Regression in Python With StatsModels: ExampleYou can also implement logistic regression in Python with the StatsModels package. Typically, you want this when you need more statistical details related to models and results.
I notice that this question is quite old now but hopefully this can help someone. With sklearn, you can use the SGDClassifier class to create a logistic regression model by simply passing in 'log' as the loss:
sklearn.linear_model.SGDClassifier(loss='log', ...).
This class implements weighted samples in the fit()
function:
classifier.fit(X, Y, sample_weight=weights)
where weights is a an array containing the sample weights that must be (obviously) the same length as the number of data points in X.
See http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.SGDClassifier.html for full documentation.
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight='balanced')
model = model.fit(X, y)
EDIT
Sample Weights can be added in the fit method. You just have to pass an array of n_samples. Check out documentation -
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
Hope this does it...
I think what you want is statsmodels
. It has great support for GLM and other linear methods. If you're coming from R, you'll find the syntax very familiar.
statsmodels weighted regression
getting started w/ statsmodels
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With