Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields.

coef    std err          t      P>|t|      [0.025      0.975]

How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?

like image 994
Andreuccio Avatar asked Jun 13 '18 16:06

Andreuccio


Video Answer


1 Answers

You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression.

Assume you name your dataframe as df, which has independent variables x1, x2, and x3, and dependent variable y. Consider the following code:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.

Notes:

  1. Please keep in mind that you need .dropna(). Otherwise, stats.zscore will return all NaN for a column if it has any missing values.
  2. Instead of using .select_dtypes(), you can select column manually but make sure all the columns you selected are numeric.
  3. If you only care about the standardized (beta) coefficients, you can also use result.params to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5) to round them.
like image 75
steven Avatar answered Nov 10 '22 08:11

steven