Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do 2SLS IV regression using statsmodels python?

I'm trying to do 2 stage least squares regression in python using the statsmodels library:

from statsmodels.sandbox.regression.gmm import IV2SLS
                 
resultIV = IV2SLS(dietdummy['Log Income'],
                  dietdummy.drop(['Log Income', 'Diabetes']),
                  dietdummy.drop(['Log Income', 'Reads Nutri')

Reads Nutri is my endogenous variable my instrument is Diabetes and my dependent variable is Log Income.

Did I do this right? It is much different than the way I would do it on Stata.

Also, when I do resultIV.summary(), I get a TypeError (something to do with the F statistic being nonetype). How can I resolve this?

like image 750
NANA Avatar asked May 03 '16 18:05

NANA


People also ask

How do you use 2SLS regression?

Click on the “analysis” menu and select the “regression” option. Select two-stage least squares (2SLS) regression analysis from the regression option. From the 2SLS regression window, select the dependent, independent and instrumental variable. Click on the “ok” button.

Is 2SLS the same as IV?

Generally 2SLS is referred to as IV estimation for models with more than one instrument and with only one endogenous explanatory variable. You can also use two stage least squares estimation for a model with one instrumental variable.

What is the difference between OLS and 2SLS?

To see why you can't compare them as though they estimate the same thing, recall that in the best-case scenario, 2SLS estimates a local average treatment effect (LATE) whereas OLS estimates an average treatment effect (ATE) if E(D'e) = 0.

What is 2SLS estimation?

Two-stage least-squares regression uses instrumental variables that are uncorrelated with the error terms to compute estimated values of the problematic predictor(s) (the first stage), and then uses those computed values to estimate a linear regression model of the dependent variable (the second stage).


2 Answers

I found this question when I wanted to do an IV2SLS regression myself and had the same problem. So, just for everybody else who landed here.

The documentation of statsmodels shows how to use this command. Your arguments are endog, exog, and instrumentin that order where exog includes variables which are instrumented and instrument the instruments and other control variables. In that sense, your model is fine.

The TypeError you found is currently an open bug in versions 0.6.0 and 0.8.1. and will be fixed in 0.9.0 according to the milestone.

Update (28.06.2018): Version 9.0.0 was released on 15 May and should include a fix for the aforementioned bug.

like image 190
tobiasraabe Avatar answered Oct 23 '22 15:10

tobiasraabe


Personally, I found the IV2SLS function in linearmodels 4.5 to be more intuitive than the statsmodels version, as it has separate parameters for the dependent variable and the endogenous variable(s), whereas the statsmodels version doesn't. The results I got from the linearmodels function lined up with what I would get with an Excel add-in I got through school.

If you choose to use the linearmodels function, this guide should also help. For instance, it showed me that I needed to add in a constant for my function to produce the correct output.

like image 22
KBurchfiel Avatar answered Oct 23 '22 15:10

KBurchfiel