when using the .summary()
function using pandas statsmodels, the OLS Regression Results include the following fields.
coef std err t P>|t| [0.025 0.975]
How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?
You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression.
Assume you name your dataframe as df
, which has independent variables x1
, x2
, and x3
, and dependent variable y
. Consider the following code:
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)
# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()
# checking results
result.summary()
Now, the coef
will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.
Notes:
.dropna()
. Otherwise, stats.zscore
will return all NaN
for a column if it has any missing values..select_dtypes()
, you can select column manually but make sure all the columns you selected are numeric.result.params
to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5)
to round them.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With