So following the example from the Matching package and in particular the GenMatch example Link to pdf here
Following the example here
library(Matching)
data(lalonde)
attach(lalonde)
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)
BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
I(re74*re75))
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
pop.size=16, max.generations=10, wait.generations=1)
Y=re78/1000
mout <- Match(Y=Y, Tr=treat, X=X, Weight.matrix=genout)
summary(mout)
We see that all the treatment cases are matched with control cases. Now lets say we want exact matching on married status, (or any other variable). But we want to still use the GenMatch matrix created before.
Referring to the link
Exact = .....If a logical vector is provided, a logical value should be provided for each covariate in X. Using a logical vector allows the user to specify exact matching for some but not other variables. When exact matches are not found, observations are dropped.
Therefore is the following correct??
mout2 <- Match(Y=Y, Tr=treat, X=X, exact=c(0,0,0,0,1,0,0,0,0,0), Weight.matrix=genout)
summary(mout2)
I would say that has not been correct, as if you compare
summary(mout$weights)
summary(mout2$weights)
You get the same values
Matching the samples The method command method="nearest" specifies that the nearest neighbors method will be used. Other matching methods are exact matching, subclassification, optimal matching, genetic matching, and full matching ( method = c("exact", "subclass", "optimal", ""genetic", "full") ).
Mahalanobis distance matching (MDM) and propensity score matching (PSM) are methods of doing the same thing, which is to find a subset of control units similar to treated units to arrive at a balanced sample (i.e., where the distribution of covariates is the same in both groups).
I should start by saying that I have never used those packages and functions before, my answer is purely based on playing with your code and the functions documentation.
It seems that there is a poorly documented, unwarned, precedence of Weight.matrix
over exact
in the Match()
function. There's a hint in its help page(?Match
):
Weight.matrix: ...
This code changes the weights implied by the inverse of the variances by multiplying the first variable by a 1000 so that it is highly weighted. In order to enforce exact matching see the exact and caliper options.
When it says you should use exact
to enforce exact matching (as opposed to giving the weights calculated manually or from GenMatch()
), it seems to me that it's saying you should use one or the other. The behaviour, however, is that exact
seems to be ignored when you provide an argument to Weight.matrix
. Remove this from the function, and you'll get different results:
> mout2 <- Match(Y=Y, Tr=treat, X=X, exact=c(0,0,0,0,1,0,0,0,0,0))
> summary(mout2)
Estimate... 1.7605
AI SE...... 0.86408
T-stat..... 2.0374
p.val...... 0.041606
I can't go into the detail of what the implications of this changes are simply because I'm not familiar with the theory behind it.
I checked the source of Match()
, but there's nothing useful there besides that it calls a function called RmatchLoop()
, which I wasn't able to find anywhere (I'm guessing it's package internal and some other voodoo is necessary to see it).
Based on this, I think that your judgment should be weather or not it makes sense to use both arguments, and from what I read, it doesn't. There's no reason to give different weights to each covariate if you in fact only want to match to one of them.
By the way, your code could use some improvements, such as:
attach
, it's dangerous if you decide to use variables with the same names as your data columns.cbind
ing almost all columns of a dataframe, just subset-out the ones you don't want:Code:
X <- lalonde[,!(colnames(lalonde)=="re78" | colnames(lalonde) == "treat")]
#or
X <- subset(lalonde, select=-c(re78, treat)) #Subset is shorter in this case, but usually not recommended
#instead of
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)
The same thing can be done for BalanceMat
. And another advantage is that you keep the data as a dataframe.
exact
argument, a cleaner way would be:Code:
exact = colnames(X)=="married"
This way you are less prone to any change in the columns orders, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With