Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the features names of the coefficients using scikit linear regression?

#training the model model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long'] model_2_features = model_1_features + ['bed_bath_rooms'] model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']  model_1 = linear_model.LinearRegression() model_1.fit(train_data[model_1_features], train_data['price'])  model_2 = linear_model.LinearRegression() model_2.fit(train_data[model_2_features], train_data['price'])  model_3 = linear_model.LinearRegression() model_3.fit(train_data[model_3_features], train_data['price'])  # extracting the coef print model_1.coef_ print model_2.coef_ print model_3.coef_ 

If I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff

like image 759
amehta Avatar asked Jan 07 '16 07:01

amehta


People also ask

How do you find the coefficient in a linear regression?

How to Find the Regression Coefficient. A regression coefficient is the same thing as the slope of the line of the regression equation. The equation for the regression coefficient that you'll find on the AP Statistics test is: B1 = b1 = Σ [ (xi – x)(yi – y) ] / Σ [ (xi – x)2].

How do you find the regression coefficient in Python?

import numpy as np # Simulate data using a quadratic equation with coefficients y=ax^2+bx+c a, b, c = 1, 2, 3 x = np. arange(100) # Add random component to y values for estimation y = a*x**2 + b*x + c + np. random. randn(100) # Get X matrix [100x3] X = np.

How do you find the summary of a linear regression in Python?

If you want to extract a summary of a regression model in Python, you should use the statsmodels package. The code below demonstrates how to use this package to fit the same multiple linear regression model as in the earlier example and obtain the model summary.


Video Answer


2 Answers

The trick is that right after you have trained your model, you know the order of the coefficients:

model_1 = linear_model.LinearRegression() model_1.fit(train_data[model_1_features], train_data['price']) print(list(zip(model_1.coef_, model_1_features))) 

This will print the coefficients and the correct feature. (Tested with pandas DataFrame)

If you want to reuse the coefficients later you can also put them in a dictionary:

coef_dict = {} for coef, feat in zip(model_1.coef_,model_1_features):     coef_dict[feat] = coef 

(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)

like image 105
Robin Spiess Avatar answered Oct 03 '22 22:10

Robin Spiess


@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:

coef_dict = {} for coef, feat in zip(model_1.coef_[0,:],model_1_features):     coef_dict[feat] = coef 

Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.

like image 34
rocksteady Avatar answered Oct 03 '22 22:10

rocksteady