Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Statsmodels - Wald Test for significance of trend in coefficients in Linear Regression Model (OLS)

I have used Statsmodels to generate a OLS linear regression model to predict a dependent variable based on about 10 independent variables. The independent variables are all categorical.

I am interested in looking closer at the significance of the coefficients for one of the independent variables. There are 4 categories, so 3 coefficients -- each of which are highly significant. I would also like to look at the significance of the trend across all 3 categories. From my (limited) understanding, this is often done using a Wald Test and comparing all of the coefficients to 0.

How exactly is this done using Statsmodels? I see there is a Wald Test method for the OLS function. It seems you have to pass in values for all of the coefficients when using this method.

My approach was the following...

First, here are all of the coefficients:

np.array(lm.params) = array([ 0.21538725,  0.05675108,  0.05020252,  0.08112228,  0.00074715,
        0.03886747,  0.00981819,  0.19907263,  0.13962354,  0.0491201 ,
       -0.00531318,  0.00242845, -0.0097336 , -0.00143791, -0.01939182,
       -0.02676771,  0.01649944,  0.01240742, -0.00245309,  0.00757727,
        0.00655152, -0.02895381, -0.02027537,  0.02621716,  0.00783884,
        0.05065323,  0.04264466, -0.13068456, -0.15694931, -0.25518566,
       -0.0308599 , -0.00558183,  0.02990139,  0.02433505, -0.01582824,
       -0.00027538,  0.03170669,  0.01130944,  0.02631403])

I am only interested in params 2-4 (which are the 3 coefficients of interest).

coeffs = np.zeros_like(lm.params)
coeffs = coeffs[1:4] = [0.05675108,  0.05020252,  0.08112228]

Checking to make sure this worked:

array([ 0.        ,  0.05675108,  0.05020252,  0.08112228,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ])

Looks good, now to run in the test!

lm.wald_test(coeffs) = 
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[ 13.11493673]]), p=0.000304699208434, df_denom=1248, df_num=1>

Is this the correct approach? I could really use some help!

like image 664
JHawkins Avatar asked Nov 25 '25 18:11

JHawkins


1 Answers

A linear hypothesis has the form R params = q where R is the matrix that defines the linear combination of parameters and q is the hypothesized value.

In the simple case where we want to test whether some parameters are zero, the R matrix has a 1 in the column corresponding to the position of the parameter and zeros everywhere else, and q is zero, which is the default. Each row specifies a linear combination of parameters, which defines a hypothesis as part of the overall or joint hypothesis.

In this case, the simplest way to get the restriction matrix is by using the corresponding rows of an identity matrix

R = np.eye(len(results.params))[1:4]

Then, lm.wald_test(R) will provide the test for the joint hypothesis that the 3 parameters are zero.

A simpler way to specify the restriction is by using the names of the parameters and defining the restrictions by a list of strings.

The model result classes also have a new method wald_test_terms which automatically generates the wald tests for terms in the design matrix where the hypothesis includes several parameters or columns, as in the case of categorical explanatory variables or of polynomial explanatory variables. This is available in statsmodels master and will be in the upcoming 0.7 release.

like image 154
Josef Avatar answered Nov 27 '25 07:11

Josef



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!