Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Heckman selection model in R manually

I would like to calculate a Heckman selection model manually in R. My problem is that the standard errors are biased. Is there a way to correct these manually as well?

Below my (sample) code from the sampleSelection model (correct SEs), and the manual code (correct Estimates, wrong SEs)

 require(sampleSelection)

data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )

Using sampleSelection

heckman <- selection(selection = lfp ~ age + I(age^2) + faminc + kids + educ, outcome = wage ~ exper + I(exper^2) + educ + city, 
                                data = Mroz87, method = "2step")
summary(heckman)

Manually

seleqn1 <- glm(lfp ~ age + I(age^2) + faminc + kids + educ, family=binomial(link="probit"), data=Mroz87)
summary(seleqn1)

# Calculate inverse Mills ratio by hand ##
Mroz87$IMR <- dnorm(seleqn1$linear.predictors)/pnorm(seleqn1$linear.predictors)

# Outcome equation correcting for selection ## ==> correct estimates, wrong SEs
outeqn1 <- lm(wage ~ exper + I(exper^2) + educ + city + IMR, data=Mroz87, subset=(lfp==1))
summary(outeqn1)
like image 817
research111 Avatar asked Aug 21 '16 22:08

research111


People also ask

What is Heckman sample selection model?

Heckman's (1974, 1978, 1979) sample selection model was developed using an econometric framework for handling limited dependent variables. It was designed to address the problem of estimating the average wage of women using data collected from a population of women in which housewives were excluded by self-selection.

What is meant by sample selection in R?

Sample Selection Models in R: Package sampleSelection. • Random experiment, the situation where the participants do not have control over their. status but the researcher does. Randomisation is often the best possible method as it is easy to analyse and understand.

How do you calculate inverse Mills ratio in R?

The Inverse Mills Ratio (IMR) is defined as the ratio of the standard normal density, ϕ, divided by the standard normal cumulative distribution function, Φ: IMR(x)=ϕ(x)Φ(x),x∈R.


1 Answers

myprobit    <- probit(lfp ~ age + I(age^2) + faminc + kids + educ - 1, x = TRUE, 
                           iterlim = 30, data=Mroz87)

imrData     <- invMillsRatio(myprobit) # same as yours in this particular case
Mroz87$IMR1 <- imrData$IMR1

outeqn1     <- lm(wage ~ -1 + exper + I(exper^2) + educ + city + IMR1, 
                  data=Mroz87, subset=(lfp==1))

The main thing was that you use intercept models instead of no-intercept.

like image 103
Hack-R Avatar answered Sep 17 '22 20:09

Hack-R