I use the example from the sampleSelection package
## Greene( 2003 ): example 22.8, page 786
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
# Two-step estimation
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
# ML estimation
test2 = selection( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
pr2 <- predict(test2,Mroz87)
pr1 <- predict(test1,Mroz87)
My problem is that the predict function does not work. I get this error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('selection', 'maxLik', 'maxim', 'list')"
The predict function works for many models so I wonder why I get an error for heckman regression models.
-----------UPDATE----------- I made some progress but I still need your help. I build an original heckman model for comparsion:
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87[1:600,] )
After that I start building it on my own. Heckman model requires a selection equation:
zi* = wi γ + ui
where zi =1 if zi* >0 and zi = 0 if zi* <=0
after you calculate yi = xi*beta +ei ONLY for the cases where zi*>0
I build the probit model first:
library(MASS)
#probit1 = probit(lfp ~ age + I( age^2 ) + faminc + kids + educ, Mroz87, x = TRUE, print.level = print.level - 1, iterlim = 30)
myprobit <- glm(lfp ~ age + I( age^2 ) + faminc + kids + educ, family = binomial(link = "probit"),
data = Mroz87[1:600,])
summary(myprobit)
The model is exactly the same just as with the heckit command.
Then I build a lm model:
#get predictions for the variables (the data is not needed but I specify it anyway)
selectvar <- predict(myprobit,data = Mroz87[1:600,])
#bind the prediction to the table (I build a new one in my case)
newdata = cbind(Mroz87[1:600,],selectvar)
#Build an lm model for the subset where zi>0
lm1 = lm(wage ~ exper + I( exper^2 ) + educ + city , newdata, subset = selectvar > 0)
summary(lm1)
My issue now is that the lm model does not much the one created by heckit. I have no idea why. Any ideas?
Here is an implementation of the predict.selection
function -- it produces 4 different types of predictions (which are explained here):
library(Formula)
library(sampleSelection)
predict.selection = function(objSelection, dfPred,
type = c('link', 'prob', 'cond', 'uncond')) {
# construct the Formula object
tempS = evalq(objSelection$call$selection)
tempO = evalq(objSelection$call$outcome)
FormHeck = as.Formula(paste0(tempO[2], '|', tempS[2], '~', tempO[3], '|', tempS[3]))
# regressor matrix for the selection equation
mXSelection = model.matrix(FormHeck, data = dfPred, rhs = 2)
# regressor matrix for the outcome equation
mXOutcome = model.matrix(FormHeck, data = dfPred, rhs = 1)
# indices of the various parameters in selectionObject$estimate
vIndexBetaS = objSelection$param$index$betaS
vIndexBetaO = objSelection$param$index$betaO
vIndexErr = objSelection$param$index$errTerms
# get the estimates
vBetaS = objSelection$estimate[vIndexBetaS]
vBetaO = objSelection$estimate[vIndexBetaO]
dLambda = objSelection$estimate[vIndexErr['rho']]*
objSelection$estimate[vIndexErr['sigma']]
# depending on the type of prediction requested, return
# TODO allow the return of multiple prediction types
pred = switch(type,
link = mXSelection %*% vBetaS,
prob = pnorm(mXSelection %*% vBetaS),
uncond = mXOutcome %*% vBetaO,
cond = mXOutcome %*% vBetaO +
dnorm(temp <- mXSelection %*% vBetaS)/pnorm(temp) * dLambda)
return(pred)
}
Suppose you estimate the following Heckman sample selection model using MLE:
data(Mroz87)
# define a new variable
Mroz87$kids = (Mroz87$kids5 + Mroz87$kids618 > 0)
# create the estimation sample
Mroz87Est = Mroz87[1:600, ]
# create the hold out sample
Mroz87Holdout = Mroz87[601:nrow(Mroz87), ]
# estimate the model using MLE
heckML = selection(selection = lfp ~ age + I(age^2) + faminc + kids + educ,
outcome = wage ~ exper + I(exper^2) + educ + city, data = Mroz87Est)
summary(heckML)
The different types of predictions are computed as below:
vProb = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'prob')
vLink = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'link')
vCond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'cond')
vUncond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'uncond')
You can verify these computation on a platform that produces these outputs, such as Stata.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With