I'm trying to do 2 stage least squares regression in python using the statsmodels
library:
from statsmodels.sandbox.regression.gmm import IV2SLS
resultIV = IV2SLS(dietdummy['Log Income'],
dietdummy.drop(['Log Income', 'Diabetes']),
dietdummy.drop(['Log Income', 'Reads Nutri')
Reads Nutri
is my endogenous variable my instrument is Diabetes
and my dependent variable is Log Income
.
Did I do this right? It is much different than the way I would do it on Stata.
Also, when I do resultIV.summary()
, I get a TypeError
(something to do with the F statistic being nonetype). How can I resolve this?
Click on the “analysis” menu and select the “regression” option. Select two-stage least squares (2SLS) regression analysis from the regression option. From the 2SLS regression window, select the dependent, independent and instrumental variable. Click on the “ok” button.
Generally 2SLS is referred to as IV estimation for models with more than one instrument and with only one endogenous explanatory variable. You can also use two stage least squares estimation for a model with one instrumental variable.
To see why you can't compare them as though they estimate the same thing, recall that in the best-case scenario, 2SLS estimates a local average treatment effect (LATE) whereas OLS estimates an average treatment effect (ATE) if E(D'e) = 0.
Two-stage least-squares regression uses instrumental variables that are uncorrelated with the error terms to compute estimated values of the problematic predictor(s) (the first stage), and then uses those computed values to estimate a linear regression model of the dependent variable (the second stage).
I found this question when I wanted to do an IV2SLS regression myself and had the same problem. So, just for everybody else who landed here.
The documentation of statsmodels shows how to use this command. Your arguments are endog
, exog
, and instrument
in that order where exog
includes variables which are instrumented and instrument
the instruments and other control variables. In that sense, your model is fine.
The TypeError
you found is currently an open bug in versions 0.6.0 and 0.8.1. and will be fixed in 0.9.0 according to the milestone.
Update (28.06.2018): Version 9.0.0 was released on 15 May and should include a fix for the aforementioned bug.
Personally, I found the IV2SLS function in linearmodels 4.5 to be more intuitive than the statsmodels version, as it has separate parameters for the dependent variable and the endogenous variable(s), whereas the statsmodels version doesn't. The results I got from the linearmodels function lined up with what I would get with an Excel add-in I got through school.
If you choose to use the linearmodels function, this guide should also help. For instance, it showed me that I needed to add in a constant for my function to produce the correct output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With