I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the respective features, as only coefficients are returned form the coef._ attribute. The documentation says: <blockquote> Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features. </blockquote> I am passing into my regression.fit(A,B), where A is a 2-D array, with tfidf value for each feature in a document. Example format: <pre class="prettyprint"><code> "feature1" "feature2" "Doc1" .44 .22 "Doc2" .11 .6 "Doc3" .22 .2 </code></pre> B are my target values for the data, which are just numbers 1-100 associated with each document: <pre class="prettyprint"><code>"Doc1" 50 "Doc2" 11 "Doc3" 99 </code></pre> Using regression.coef_, I get a list of coefficients, but not their corresponding features! How can I get the features? I'm guessing I need to modfy the structure of my B targets, but I don't know how.

What I found to work was: X = your independent variables <pre class="prettyprint"><code>coefficients = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(logistic.coef_))], axis = 1) </code></pre> The assumption you stated: that the order of regression.coef_ is the same as in the TRAIN set holds true in my experiences. (works with the underlying data and also checks out with correlations between X and y)

You can do that by creating a data frame: <pre class="prettyprint"><code>cdf = pd.DataFrame(regression.coef_, X.columns, columns=['Coefficients']) print(cdf) </code></pre>

Scikit-Learn Linear Regression how to get coefficient's respective features?

Tags:

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the respective features, as only coefficients are returned form the coef._ attribute. The documentation says:

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

I am passing into my regression.fit(A,B), where A is a 2-D array, with tfidf value for each feature in a document. Example format:

         "feature1"   "feature2" "Doc1"    .44          .22 "Doc2"    .11          .6 "Doc3"    .22          .2

B are my target values for the data, which are just numbers 1-100 associated with each document:

"Doc1"    50 "Doc2"    11 "Doc3"    99

Using regression.coef_, I get a list of coefficients, but not their corresponding features! How can I get the features? I'm guessing I need to modfy the structure of my B targets, but I don't know how.

716

asked Nov 15 '14 23:11

jeffrey

2 Answers

What I found to work was:

X = your independent variables

coefficients = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(logistic.coef_))], axis = 1)

The assumption you stated: that the order of regression.coef_ is the same as in the TRAIN set holds true in my experiences. (works with the underlying data and also checks out with correlations between X and y)

131

answered Oct 07 '22 16:10

Kirsche

You can do that by creating a data frame:

cdf = pd.DataFrame(regression.coef_, X.columns, columns=['Coefficients']) print(cdf)

answered Oct 07 '22 17:10

Pran Kumar Sarkar

Related questions
                            
                                How to handle multiple submit buttons in a form using Angular JS?
                            
                                pip cffi package installation failed on osx
                            
                                JavaFX css themes [closed]
                            
                                Weird looking Javascript for loop
                            
                                Self signed X509 Certificate with Bouncy Castle in Java
                            
                                Solution for CA2227 or better approach?
                            
                                Java 8 stream API: Exceptions when modifying Lists
                            
                                Sending live video frame over network in python opencv
                            
                                How do I run an action for all requests in Flask?
                            
                                Rails: How to delete a pending migration
                            
                                Why does writing a number in scientific notation make a difference in this code?
                            
                                Could not load file or assembly 'Newtonsoft.Json, Version=7.0.0.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With