How to find the importance of the features for a logistic regression model?

Tags:

I have a binary prediction model trained by logistic regression algorithm. I want know which features(predictors) are more important for the decision of positive or negative class. I know there is coef_ parameter comes from the scikit-learn package, but I don't know whether it is enough to for the importance. Another thing is how I can evaluate the coef_ values in terms of the importance for negative and positive classes. I also read about standardized regression coefficients and I don't know what it is.

Lets say there are features like size of tumor, weight of tumor, and etc to make a decision for a test case like malignant or not malignant. I want to know which of the features are more important for malignant and not malignant prediction. Does it make sort of sense?

335

asked Dec 02 '15 20:12

mgokhanbakal

1 Answers

One of the simplest options to get a feeling for the "influence" of a given parameter in a linear classification model (logistic being one of those), is to consider the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.

Consider this example:

import numpy as np     from sklearn.linear_model import LogisticRegression  x1 = np.random.randn(100) x2 = 4*np.random.randn(100) x3 = 0.5*np.random.randn(100) y = (3 + x1 + x2 + x3 + 0.2*np.random.randn()) > 0 X = np.column_stack([x1, x2, x3])  m = LogisticRegression() m.fit(X, y)  # The estimated coefficients will all be around 1: print(m.coef_)  # Those values, however, will show that the second parameter # is more influential print(np.std(X, 0)*m.coef_)

An alternative way to get a similar result is to examine the coefficients of the model fit on standardized parameters:

m.fit(X / np.std(X, 0), y) print(m.coef_)

Note that this is the most basic approach and a number of other techniques for finding feature importance or parameter influence exist (using p-values, bootstrap scores, various "discriminative indices", etc).

I am pretty sure you would get more interesting answers at https://stats.stackexchange.com/.

143

answered Sep 22 '22 08:09

KT.

Related questions
                            
                                How to split a string of space separated numbers into integers?
                            
                                Django, template context processors
                            
                                How to format elapsed time from seconds to hours, minutes, seconds and milliseconds in Python?
                            
                                ubuntu /usr/bin/env: python: No such file or directory
                            
                                How to remove any URL within a string in Python
                            
                                Django: Converting an entire set of a Model's objects into a single dictionary
                            
                                Pandas: Knowing when an operation affects the original dataframe
                            
                                What's the difference between Model.query and session.query(Model) in SQLAlchemy?
                            
                                How to set up a staging environment on Google App Engine
                            
                                Which Eclipse package should I download for PyDev?
                            
                                How to use Python's "easy_install" on Windows ... it's not so easy
                            
                                WARNING:tensorflow:sample_weight modes were coerced from ... to ['...']
                            
                                __new__ and __init__ in Python
                            
                                How to define max_queue_size, workers and use_multiprocessing in keras fit_generator()?
                            
                                How do Rpy2, pyrserve and PypeR compare?
                            
                                Packaging a Python script on Linux into a Windows executable
                            
                                How to use tempfile.NamedTemporaryFile()?
                            
                                Tensorflow: None of the MLIR optimization passes are enabled (registered 1)
                            
                                Where is the "from __future__ import braces" code?
                            
                                How can static method access class variable in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find the importance of the features for a logistic regression model?

Tags:

python

machine-learning

scikit-learn

logistic-regression

mgokhanbakal

People also ask

1 Answers

KT.

Recent Activity

Donate For Us