Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

which coefficients go to which class in multiclass logistic regression in scikit learn?

I'm using scikit learn's Logistic Regression for a multiclass problem.

logit = LogisticRegression(penalty='l1')
logit = logit.fit(X, y)

I'm interested in which features are driving this decision.

logit.coef_

The above gives me a beautiful dataframe in (n_classes, n_features) format, but all the classes and feature names are gone. With features, that's okay, because making the assumption that they're indexed the same way as I passed them in seems safe...

But with classes, it's a problem, since I never explicitly passed in the classes in any order. So which class do coefficient sets (rows in the dataframe) 0, 1, 2, and 3 belong to?

like image 797
Alex Lenail Avatar asked Apr 25 '17 23:04

Alex Lenail


People also ask

How does logistic regression work for multiclass?

Logistic regression is designed for two-class problems, modeling the target using a binomial probability distribution function. The class labels are mapped to 1 for the positive class or outcome and 0 for the negative class or outcome. The fit model predicts the probability that an example belongs to class 1.

How do you interpret logistic regression coefficients?

E.g., if we were using GPA to predict test scores, a coefficient of 10 for GPA would mean that for every one-point increase in GPA we expect a 10-point increase on the test. Technically, the logistic regression coefficient means the same thing: as GPA goes up by 1, the log odds of being accepted go up by 1.051109.

Can we apply logistic regression on a 3 class classification problem?

Yes, we can apply logistic regression on 3 classification problem, We can use One Vs all method for 3 class classification in logistic regression.

What is the name of the package of the logistic regression class which in Sklearn for machine learning?

sklearn. linear_model . LogisticRegression. Logistic Regression (aka logit, MaxEnt) classifier.


1 Answers

The order will be same as returned by the logit.classes_ (classes_ is an attribute of the fitted model, which represents the unique classes present in y) and mostly they will be arranged alphabetically in case of strings.

To explain it, we the above mentioned labels y on an random dataset with LogisticRegression:

import numpy as np
from sklearn.linear_model import LogisticRegression

X = np.random.rand(45,5)
y = np.array(['GR3', 'GR4', 'SHH', 'GR3', 'GR4', 'SHH', 'GR4', 'SHH',
              'GR4', 'WNT', 'GR3', 'GR4', 'GR3', 'SHH', 'SHH', 'GR3', 
              'GR4', 'SHH', 'GR4', 'GR3', 'SHH', 'GR3', 'SHH', 'GR4', 
              'SHH', 'GR3', 'GR4', 'GR4', 'SHH', 'GR4', 'SHH', 'GR4', 
              'GR3', 'GR3', 'WNT', 'SHH', 'GR4', 'SHH', 'SHH', 'GR3',
              'WNT', 'GR3', 'GR4', 'GR3', 'SHH'], dtype=object)

lr = LogisticRegression()
lr.fit(X,y)

# This is what you want
lr.classes_

#Out:
#    array(['GR3', 'GR4', 'SHH', 'WNT'], dtype=object)

lr.coef_
#Out:
#    array of shape [n_classes, n_features]

So in the coef_ matrix, the index 0 in rows represents the 'GR3' (the first class in classes_ array, 1 = 'GR4' and so on.

Hope it helps.

like image 57
Vivek Kumar Avatar answered Oct 19 '22 12:10

Vivek Kumar