I'm using scikit learn's Logistic Regression for a multiclass problem.
logit = LogisticRegression(penalty='l1')
logit = logit.fit(X, y)
I'm interested in which features are driving this decision.
logit.coef_
The above gives me a beautiful dataframe in (n_classes, n_features)
format, but all the classes and feature names are gone. With features, that's okay, because making the assumption that they're indexed the same way as I passed them in seems safe...
But with classes, it's a problem, since I never explicitly passed in the classes in any order. So which class do coefficient sets (rows in the dataframe) 0, 1, 2, and 3 belong to?
Logistic regression is designed for two-class problems, modeling the target using a binomial probability distribution function. The class labels are mapped to 1 for the positive class or outcome and 0 for the negative class or outcome. The fit model predicts the probability that an example belongs to class 1.
E.g., if we were using GPA to predict test scores, a coefficient of 10 for GPA would mean that for every one-point increase in GPA we expect a 10-point increase on the test. Technically, the logistic regression coefficient means the same thing: as GPA goes up by 1, the log odds of being accepted go up by 1.051109.
Yes, we can apply logistic regression on 3 classification problem, We can use One Vs all method for 3 class classification in logistic regression.
sklearn. linear_model . LogisticRegression. Logistic Regression (aka logit, MaxEnt) classifier.
The order will be same as returned by the logit.classes_
(classes_ is an attribute of the fitted model, which represents the unique classes present in y) and mostly they will be arranged alphabetically in case of strings.
To explain it, we the above mentioned labels y on an random dataset with LogisticRegression:
import numpy as np
from sklearn.linear_model import LogisticRegression
X = np.random.rand(45,5)
y = np.array(['GR3', 'GR4', 'SHH', 'GR3', 'GR4', 'SHH', 'GR4', 'SHH',
'GR4', 'WNT', 'GR3', 'GR4', 'GR3', 'SHH', 'SHH', 'GR3',
'GR4', 'SHH', 'GR4', 'GR3', 'SHH', 'GR3', 'SHH', 'GR4',
'SHH', 'GR3', 'GR4', 'GR4', 'SHH', 'GR4', 'SHH', 'GR4',
'GR3', 'GR3', 'WNT', 'SHH', 'GR4', 'SHH', 'SHH', 'GR3',
'WNT', 'GR3', 'GR4', 'GR3', 'SHH'], dtype=object)
lr = LogisticRegression()
lr.fit(X,y)
# This is what you want
lr.classes_
#Out:
# array(['GR3', 'GR4', 'SHH', 'WNT'], dtype=object)
lr.coef_
#Out:
# array of shape [n_classes, n_features]
So in the coef_
matrix, the index 0 in rows represents the 'GR3' (the first class in classes_
array, 1 = 'GR4' and so on.
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With