I would like to compute the beta or standardized coefficient of a linear regression model using standard tools in Python (numpy, pandas, scipy.stats, etc.).
A friend of mine told me that this is done in R with the following command:
lm(scale(y) ~ scale(x))
Currently, I am computing it in Python like this:
from scipy.stats import linregress
from scipy.stats.mstats import zscore
(beta_coeff, intercept, rvalue, pvalue, stderr) = linregress(zscore(x), zscore(y))
print('The Beta Coeff is: %f' % beta_coeff)
Is there a more straightforward function to compute this figure in Python?
You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression. Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable. Notes: Please keep in mind that you need .
Betas are calculated by subtracting the mean from the variable and dividing by its standard deviation. This results in standardized variables having a mean of zero and a standard deviation of 1.
The standardized coefficient is found by multiplying the unstandardized coefficient by the ratio of the standard deviations of the independent variable and dependent variable.
Beta coefficients from regression coefficients The x and y refer to the predictor and response variables. You therefore take the standard deviation of the predictor variable, divide by the standard deviation of the response and multiply by the regression coefficient for the predictor under consideration.
Python is a general purpose language, but R was designed specifically for statistics. It's almost always going to take a few more lines of code to achieve the same (statistical) goal in python, purely because R comes ready to fit regression models (using lm
) as soon as you boot it up.
The short answer to your question is No - your python code is already pretty straightforward.
That said, I think a closer equivalent to your R code would be
import statsmodels.api as sm
from scipy.stats.mstats import zscore
print sm.OLS(zscore(y), zscore(x)).fit().summary()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With