Exact matching and GenMatch in R

Tags:

r

So following the example from the Matching package and in particular the GenMatch example Link to pdf here

Following the example here

library(Matching)
data(lalonde)
attach(lalonde)

X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
                    I(re74*re75))

genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)

Y=re78/1000

mout <- Match(Y=Y, Tr=treat, X=X, Weight.matrix=genout)
summary(mout)

We see that all the treatment cases are matched with control cases. Now lets say we want exact matching on married status, (or any other variable). But we want to still use the GenMatch matrix created before.

Referring to the link

Exact = .....If a logical vector is provided, a logical value should be provided for each covariate in X. Using a logical vector allows the user to specify exact matching for some but not other variables. When exact matches are not found, observations are dropped.

Therefore is the following correct??

Click to copy

mout2 <- Match(Y=Y, Tr=treat, X=X, exact=c(0,0,0,0,1,0,0,0,0,0), Weight.matrix=genout)
summary(mout2)

I would say that has not been correct, as if you compare

Click to copy

summary(mout$weights)
summary(mout2$weights)

You get the same values

264

asked May 01 '15 10:05

lukeg

1 Answers

I should start by saying that I have never used those packages and functions before, my answer is purely based on playing with your code and the functions documentation.

It seems that there is a poorly documented, unwarned, precedence of Weight.matrix over exact in the Match() function. There's a hint in its help page(?Match):

Weight.matrix: ...

This code changes the weights implied by the inverse of the variances by multiplying the first variable by a 1000 so that it is highly weighted. In order to enforce exact matching see the exact and caliper options.

When it says you should use exact to enforce exact matching (as opposed to giving the weights calculated manually or from GenMatch()), it seems to me that it's saying you should use one or the other. The behaviour, however, is that exact seems to be ignored when you provide an argument to Weight.matrix. Remove this from the function, and you'll get different results:

Click to copy

> mout2 <- Match(Y=Y, Tr=treat, X=X, exact=c(0,0,0,0,1,0,0,0,0,0))
> summary(mout2)

Estimate...  1.7605 
AI SE......  0.86408 
T-stat.....  2.0374 
p.val......  0.041606

I can't go into the detail of what the implications of this changes are simply because I'm not familiar with the theory behind it.

I checked the source of Match(), but there's nothing useful there besides that it calls a function called RmatchLoop(), which I wasn't able to find anywhere (I'm guessing it's package internal and some other voodoo is necessary to see it).

Based on this, I think that your judgment should be weather or not it makes sense to use both arguments, and from what I read, it doesn't. There's no reason to give different weights to each covariate if you in fact only want to match to one of them.

By the way, your code could use some improvements, such as:

Avoid using attach, it's dangerous if you decide to use variables with the same names as your data columns.
Instead of cbinding almost all columns of a dataframe, just subset-out the ones you don't want:

Code:

Click to copy

X <- lalonde[,!(colnames(lalonde)=="re78" | colnames(lalonde) == "treat")]
#or
X <- subset(lalonde, select=-c(re78, treat)) #Subset is shorter in this case, but usually not recommended
#instead of
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

The same thing can be done for BalanceMat. And another advantage is that you keep the data as a dataframe.

Also, for the exact argument, a cleaner way would be:

Code:

Click to copy

exact = colnames(X)=="married"

This way you are less prone to any change in the columns orders, etc.

180

answered Nov 15 '22 04:11

Molx

Related questions
                            
                                How to accurately display SI prefix for numbers in y-axis scale of plot made with ggplot2 in R?
                            
                                The logic of passing a missing argument to the R-function rep
                            
                                How to use dplyr for programming
                            
                                Plot going off graph in gvisMotionChart
                            
                                How to minimize size of object of class "lm" without compromising it being passed to predict()
                            
                                subsetting by multi-column index/key in dplyr (have data.table soln)
                            
                                How to search PubMed or other databases using R
                            
                                Apply a function to a multi-dimensional array: R vs MATLAB
                            
                                How to fix prettytable to display chinese character properly
                            
                                Can my use of paste0() in R be corrected so that this function runs as fast as the original Python example?
                            
                                Include a specific chunk from one markdown document in another document
                            
                                Where can I find the limiting distribution of the Kolmogorov-Smirnov distance in R?
                            
                                Counting unique pairs of categorical variables in R [duplicate]
                            
                                RTextTools create_matrix returns non-character argument error
                            
                                Subsetting in H2O R
                            
                                Color path segments in ggvis / layer_paths
                            
                                Why is there a difference between length(f) and length(g) in this example?
                            
                                Make a landscape table in a word document
                            
                                R plot() produces date labels in Russian
                            
                                R Algorithm for generating all possible factorizations of a number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Exact matching and GenMatch in R

Tags:

matching

r

lukeg

People also ask

1 Answers

Molx

Recent Activity

Donate For Us