Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exact matching and GenMatch in R

Tags:

matching

r

So following the example from the Matching package and in particular the GenMatch example Link to pdf here

Following the example here

library(Matching)
data(lalonde)
attach(lalonde)

X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
                    I(re74*re75))

genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)

Y=re78/1000

mout <- Match(Y=Y, Tr=treat, X=X, Weight.matrix=genout)
summary(mout)

We see that all the treatment cases are matched with control cases. Now lets say we want exact matching on married status, (or any other variable). But we want to still use the GenMatch matrix created before.

Referring to the link

Exact = .....If a logical vector is provided, a logical value should be provided for each covariate in X. Using a logical vector allows the user to specify exact matching for some but not other variables. When exact matches are not found, observations are dropped.

Therefore is the following correct??

mout2 <- Match(Y=Y, Tr=treat, X=X, exact=c(0,0,0,0,1,0,0,0,0,0), Weight.matrix=genout)
summary(mout2)

I would say that has not been correct, as if you compare

summary(mout$weights)
summary(mout2$weights)

You get the same values

like image 264
lukeg Avatar asked May 01 '15 10:05

lukeg


People also ask

How do you match samples in R?

Matching the samples The method command method="nearest" specifies that the nearest neighbors method will be used. Other matching methods are exact matching, subclassification, optimal matching, genetic matching, and full matching ( method = c("exact", "subclass", "optimal", ""genetic", "full") ).

What is Mahalanobis matching?

Mahalanobis distance matching (MDM) and propensity score matching (PSM) are methods of doing the same thing, which is to find a subset of control units similar to treated units to arrive at a balanced sample (i.e., where the distribution of covariates is the same in both groups).


1 Answers

I should start by saying that I have never used those packages and functions before, my answer is purely based on playing with your code and the functions documentation.

It seems that there is a poorly documented, unwarned, precedence of Weight.matrix over exact in the Match() function. There's a hint in its help page(?Match):

Weight.matrix: ...

This code changes the weights implied by the inverse of the variances by multiplying the first variable by a 1000 so that it is highly weighted. In order to enforce exact matching see the exact and caliper options.

When it says you should use exact to enforce exact matching (as opposed to giving the weights calculated manually or from GenMatch()), it seems to me that it's saying you should use one or the other. The behaviour, however, is that exact seems to be ignored when you provide an argument to Weight.matrix. Remove this from the function, and you'll get different results:

> mout2 <- Match(Y=Y, Tr=treat, X=X, exact=c(0,0,0,0,1,0,0,0,0,0))
> summary(mout2)

Estimate...  1.7605 
AI SE......  0.86408 
T-stat.....  2.0374 
p.val......  0.041606 

I can't go into the detail of what the implications of this changes are simply because I'm not familiar with the theory behind it.

I checked the source of Match(), but there's nothing useful there besides that it calls a function called RmatchLoop(), which I wasn't able to find anywhere (I'm guessing it's package internal and some other voodoo is necessary to see it).

Based on this, I think that your judgment should be weather or not it makes sense to use both arguments, and from what I read, it doesn't. There's no reason to give different weights to each covariate if you in fact only want to match to one of them.


By the way, your code could use some improvements, such as:

  1. Avoid using attach, it's dangerous if you decide to use variables with the same names as your data columns.
  2. Instead of cbinding almost all columns of a dataframe, just subset-out the ones you don't want:

Code:

X <- lalonde[,!(colnames(lalonde)=="re78" | colnames(lalonde) == "treat")]
#or
X <- subset(lalonde, select=-c(re78, treat)) #Subset is shorter in this case, but usually not recommended
#instead of
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

The same thing can be done for BalanceMat. And another advantage is that you keep the data as a dataframe.

  1. Also, for the exact argument, a cleaner way would be:

Code:

exact = colnames(X)=="married"

This way you are less prone to any change in the columns orders, etc.

like image 180
Molx Avatar answered Nov 15 '22 04:11

Molx