sklearn logistic regression - important features

Tags:

I'm pretty sure it's been asked before, but I'm unable to find an answer

Running Logistic Regression using sklearn on python, I'm able to transform my dataset to its most important features using the Transform method

classf = linear_model.LogisticRegression()
func  = classf.fit(Xtrain, ytrain)
reduced_train = func.transform(Xtrain)

How can I tell which features were selcted as most important? more generally how can I calculate the p-value of each feature in the dataset?

937

asked Jun 17 '14 04:06

mel

1 Answers

As suggested in comments above you can (and should) scale your data prior to your fit thus making the coefficients comparable. Below is a little code to show how this would work. I follow this format for comparison.

import numpy as np    
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
import pandas as pd
import matplotlib.pyplot as plt

x1 = np.random.randn(100)
x2 = np.random.randn(100)
x3 = np.random.randn(100)

#Make difference in feature dependance
y = (3 + x1 + 2*x2 + 5*x3 + 0.2*np.random.randn()) > 0

X = pd.DataFrame({'x1':x1,'x2':x2,'x3':x3})

#Scale your data
scaler = StandardScaler()
scaler.fit(X) 
X_scaled = pd.DataFrame(scaler.transform(X),columns = X.columns)

clf = LogisticRegression(random_state = 0)
clf.fit(X_scaled, y)

feature_importance = abs(clf.coef_[0])
feature_importance = 100.0 * (feature_importance / feature_importance.max())
sorted_idx = np.argsort(feature_importance)
pos = np.arange(sorted_idx.shape[0]) + .5

featfig = plt.figure()
featax = featfig.add_subplot(1, 1, 1)
featax.barh(pos, feature_importance[sorted_idx], align='center')
featax.set_yticks(pos)
featax.set_yticklabels(np.array(X.columns)[sorted_idx], fontsize=8)
featax.set_xlabel('Relative Feature Importance')

plt.tight_layout()   
plt.show()

165

answered Sep 22 '22 01:09

Keith

Related questions
                            
                                SQLAlchemy: add a relationship using id instead of object?
                            
                                How to I use PIL Image.point(table) method to apply a threshold to a 256 gray image?
                            
                                Gettext : How to update po and pot files after the source is modified
                            
                                How to skip pre header lines with csv.DictReader?
                            
                                Python: import symbolic link of a folder
                            
                                Suppressing treatment of string as iterable
                            
                                Python...testing classes?
                            
                                7zip Commands from Python
                            
                                Differences between functools.partial and a similar lambda?
                            
                                SQLAlchemy how to filter by children in many to many
                            
                                Parse XML with (X)HTML entities
                            
                                Unit testing an AsyncResult in celery
                            
                                Pymongo API TypeError: Unhashable dict
                            
                                Position colorbar inside figure
                            
                                Setting two arrays equal [duplicate]
                            
                                Python - Get current time in numpy datetime64 format
                            
                                Why you shouldn't use os.linesep when editing on text mode?
                            
                                How to make argparse print usage when no option is given to the code [duplicate]
                            
                                Gunicorn with multiple workers: Is there an easy way to execute certain code only once?
                            
                                Fetch data of variables inside script tag in Python or Content added from js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sklearn logistic regression - important features

Tags:

python

scikit-learn

feature-selection

mel

People also ask

1 Answers

Keith

Recent Activity

Donate For Us