Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use the "ivprobit" function in "ivprobit" package in R?

Tags:

I am trying to understand the syntax of the "ivprobit" function in "ivprobit" package in R. The instruction says:

 Usage
 ivprobit(formula, data)

 Arguments
    formula y~x|y1|x2 whre y is the dichotomous l.h.s.,x is the r.h.s.    
            exogenous variables,y1 is the r.h.s. endogenous variables and 
            x2 is the complete set of instruments
    data    the dataframe

Then it shows the corresponding example:

 data(eco)

 pro<-ivprobit(d2~ltass+roe+div|eqrat+bonus|ltass+roe+div+gap+cfa,eco)

 summary(pro)

If I match with the instruction's explanation,

 y= d2 = dichotomous l.h.s.
 x= ltass+roe+div = the r.h.s. exogenous variables
 y1= eqrat+bonus = the r.h.s. endogenous variables
 x2= tass+roe+div+gap+cfa = the complete set of instruments

I do not understand the difference between x and x2. Also, if x2 is the complete set of instruments, why doesn't it include the endogenous variables y1 as well? It instead additionally includes "gap" and "cfa" variables which are not even shown in x (exogenous variables) or even in y either.

If, let's say, my chosen instrumental variables are indeed "eqrat" and "bonus", how can I construct knowing the difference between x (exogenous variables) and x2 (the complete set of instruments)?

like image 380
Eric Avatar asked Feb 21 '19 12:02

Eric


1 Answers

Note that here we are discussing sintax, not the "goodness" of the model, for that kind of question you should refer to https://stats.stackexchange.com/.

Let's use this equation as an example: enter image description here.

As correctly pointed, List item are not really in the equation, it's just an example.

Here:

  • enter image description here is the dependent variable;

  • enter image description here are endogenous variables (one or more) which a are "problematic";

  • enter image description here are exogenous variables (one or more) which are not "problematic";
  • List item are the instruments (one or more) which "help" with the endogenous variables;

Why the endogenous are problematic? Because they are correlated with the error enter image description here, this causes problems with the classic OLS estimation.

enter image description here are the instruments because they have some foundamental proprieties (more here):

  • Independent of the error term;
  • Does not affect enter image description here given enter image description here held constant;
  • Correlated with enter image description here.

In the sintax proposed, we have:

  • x, exogenous, corresponding to enter image description here (not problematic);
  • y1, endogenous, corresponding to enter image description here (problematic);
  • x2, complete set of instruments, corresponding to enter image description here.

In the example you cite, x2 shares some common variables with x, which is the set of exogenous variables (not problematic), plus two more instruments.

The model is using the 3 exogenous variables as instruments, plus two more variables.

I do not understand the difference between x and x2

x2 are the instruments, which may or may not overlap with the set of exogenous variables (x).

if x2 is the complete set of instruments, why doesn't it include the endogenous variables y1 as well?

It mustn't include the endogenous variables, because those are the ones that the equation needs to take care of, using the instruments.


An example:

You want to build a model that wish to predict whether a woman in a two parent household is employed. You have these variables:

  • fem_works, the response or dependent variable;
  • fem_edu, the education level of the woman, exogenous;
  • kids, number of kids of the couple, exogenous;
  • other_income, the income of the household, endogenous (you know this as prior knowledge);
  • male_edu, the education level of the man, instrument (you choose this).

With ivprobit, this would be:

mod <- ivprobit(fem_works ~ fem_edu + kids | other_income | fem_edu + kids + male_edu, data)

other_income is problematic for the model, because you suspect that it is correlated with the error term (other shocks may affect both fem_works and other_income), you decide to use male_edu as an instrument, in order to "alleviate" that problem. (Example taken from here)

like image 85
RLave Avatar answered Nov 16 '22 16:11

RLave