I am looking for influence statistics after fitting a linear regression. In R I can obtain them (e.g.) like this:
hatvalues(fitted_model) #hatvalues (leverage)
cooks.distance(fitted_model) #Cook's D values
rstandard(fitted_model) #standardized residuals
rstudent(fitted_model) #studentized residuals
etc.
How can I obtain the same statistics when using statsmodels in Python after fitting a model like this:
#import statsmodels
import statsmodels.api as sm
#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()
#Creating a dataframe that includes the studentized residuals
sm.regression.linear_model.OLSResults.outlier_test(results)
Edit: See answer below...
The standardized residual is found by dividing the difference of the observed and expected values by the square root of the expected value. The standardized residual can be interpreted as any standard score. The mean of the standardized residual is 0 and the standard deviation is 1.
Use the OLSRresults. outlier_test() function to produce a dataset that contains the studentized residual for each observation. Show activity on this post. where X is the matrix of our independent variables.
Although the accepted answer is correct, I found it helpful to separately access the statistics as instance attributes of an influence instance (statsmodels.regression.linear_model.OLSResults.get_influence
) after I fit my model. This saved me from having to index the summary_frame
as I was only interested in one of the statistics and not all of them. So maybe this helps somebody else:
import statsmodels.api as sm
#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()
#create instance of influence
influence = results.get_influence()
#leverage (hat values)
leverage = influence.hat_matrix_diag
#Cook's D values (and p-values) as tuple of arrays
cooks_d = influence.cooks_distance
#standardized residuals
standardized_residuals = influence.resid_studentized_internal
#studentized residuals
studentized_residuals = influence.resid_studentized_external
I found it here:
http://www.statsmodels.org/dev/generated/statsmodels.stats.outliers_influence.OLSInfluence.summary_frame.html
OLSInfluence.summary_frame()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With