Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted logistic regression in Python

I'm looking for a good implementation for logistic regression (not regularized) in Python. I'm looking for a package that can also get weights for each vector. Can anyone suggest a good implementation / package? Thanks!

like image 573
user5497 Avatar asked Sep 22 '11 10:09

user5497


People also ask

What is weighted logistic regression?

Weighted logistic regression is used when you have an imbalanced dataset. Let's understand with an example. Let's assume you have a dataset with patient details and you need to predict whether patient has cancer or not. Such datasets are generally imbalanced.

How are weights set in logistic regression?

To specify weights we will make use of class_weight hyperparameter of Logistic-regression. The class_weight hyperparameter is a dictionary that defines weight of each label. Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have same weight value.

What is Logisticregression in Python?

Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression that predicts a continuous outcome. In the simplest case there are two outcomes, which is called binomial, an example of which is predicting if a tumor is malignant or benign.

Can you do logistic regression in Python?

Logistic Regression in Python With StatsModels: ExampleYou can also implement logistic regression in Python with the StatsModels package. Typically, you want this when you need more statistical details related to models and results.


3 Answers

I notice that this question is quite old now but hopefully this can help someone. With sklearn, you can use the SGDClassifier class to create a logistic regression model by simply passing in 'log' as the loss:

sklearn.linear_model.SGDClassifier(loss='log', ...).

This class implements weighted samples in the fit() function:

classifier.fit(X, Y, sample_weight=weights)

where weights is a an array containing the sample weights that must be (obviously) the same length as the number of data points in X.

See http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.SGDClassifier.html for full documentation.

like image 73
William Darling Avatar answered Oct 07 '22 14:10

William Darling


The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(class_weight='balanced')

model = model.fit(X, y)

EDIT

Sample Weights can be added in the fit method. You just have to pass an array of n_samples. Check out documentation -

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

Hope this does it...

like image 7
Vivek Kalyanarangan Avatar answered Oct 07 '22 12:10

Vivek Kalyanarangan


I think what you want is statsmodels. It has great support for GLM and other linear methods. If you're coming from R, you'll find the syntax very familiar.

statsmodels weighted regression

getting started w/ statsmodels

like image 3
Greg Avatar answered Oct 07 '22 12:10

Greg