I am trying to understand the syntax of the "ivprobit" function in "ivprobit" package in R. The instruction says:
Usage
ivprobit(formula, data)
Arguments
formula y~x|y1|x2 whre y is the dichotomous l.h.s.,x is the r.h.s.
exogenous variables,y1 is the r.h.s. endogenous variables and
x2 is the complete set of instruments
data the dataframe
Then it shows the corresponding example:
data(eco)
pro<-ivprobit(d2~ltass+roe+div|eqrat+bonus|ltass+roe+div+gap+cfa,eco)
summary(pro)
If I match with the instruction's explanation,
y= d2 = dichotomous l.h.s.
x= ltass+roe+div = the r.h.s. exogenous variables
y1= eqrat+bonus = the r.h.s. endogenous variables
x2= tass+roe+div+gap+cfa = the complete set of instruments
I do not understand the difference between x and x2. Also, if x2 is the complete set of instruments, why doesn't it include the endogenous variables y1 as well? It instead additionally includes "gap" and "cfa" variables which are not even shown in x (exogenous variables) or even in y either.
If, let's say, my chosen instrumental variables are indeed "eqrat" and "bonus", how can I construct knowing the difference between x (exogenous variables) and x2 (the complete set of instruments)?
Note that here we are discussing sintax, not the "goodness" of the model, for that kind of question you should refer to https://stats.stackexchange.com/.
Let's use this equation as an example: .
As correctly pointed, are not really in the equation, it's just an example.
Here:
is the dependent variable;
are endogenous variables (one or more) which a are "problematic";
Why the endogenous are problematic? Because they are correlated with the error , this causes problems with the classic OLS estimation.
are the instruments because they have some foundamental proprieties (more here):
In the sintax proposed, we have:
x
, exogenous, corresponding to (not problematic);y1
, endogenous, corresponding to (problematic);x2
, complete set of instruments, corresponding to .In the example you cite, x2
shares some common variables with x
, which is the set of exogenous variables (not problematic), plus two more instruments.
The model is using the 3 exogenous variables as instruments, plus two more variables.
I do not understand the difference between x and x2
x2
are the instruments, which may or may not overlap with the set of exogenous variables (x
).
if x2 is the complete set of instruments, why doesn't it include the endogenous variables y1 as well?
It mustn't include the endogenous variables, because those are the ones that the equation needs to take care of, using the instruments.
An example:
You want to build a model that wish to predict whether a woman in a two parent household is employed. You have these variables:
fem_works
, the response or dependent variable;fem_edu
, the education level of the woman, exogenous;kids
, number of kids of the couple, exogenous;other_income
, the income of the household, endogenous (you know this as prior knowledge);male_edu
, the education level of the man, instrument (you choose this).With ivprobit
, this would be:
mod <- ivprobit(fem_works ~ fem_edu + kids | other_income | fem_edu + kids + male_edu, data)
other_income
is problematic for the model, because you suspect that it is correlated with the error term (other shocks may affect both fem_works
and other_income
), you decide to use male_edu
as an instrument, in order to "alleviate" that problem. (Example taken from here)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With