Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the Weight Matrix generated in the Matching package?

Tags:

matching

r

Referring to the Matching Package, we look at the example using GenMatch.

We read that the Weight Matrix that is created is a matrix whose diagonal corresponds to the weight given to each variable in X

But we are not sure what the values generated represent - are they related to a standard deviation.

Lets take the example provided in GenMatch

library(Matching)
data(lalonde)
attach(lalonde)
#The covariates we want to match on
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)
#The covariates we want to obtain balance on
BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
I(re74*re75))
#Let's call GenMatch() to find the optimal weight to give each
#covariate in 'X' so as we have achieved balance on the covariates in
#'BalanceMat'. This is only an example so we want GenMatch to be quick
#so the population size has been set to be only 16 via the 'pop.size'
#option. This is *WAY* too small for actual problems.
#For details see http://sekhon.berkeley.edu/papers/MatchingJSS.pdf.
#
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
pop.size=16, max.generations=10, wait.generations=1)

Then we can output the Weight.matrix that will be used later to pair the data

genout$Weight.matrix

And in particularly the value assigned to the age

genout$Weight.matrix[1,1]

We get a value of ~205. But what does this weight mean or represent?

Further if we are to randomised the order of the data, the value is constantly changing.

n <- 100
P1 <- rep(NA, n)
for (i in 1:n) {

  lalonde <- lalonde[sample(1:nrow(lalonde)), ] # randomise order

  X = cbind(lalonde$age, lalonde$educ, lalonde$black, lalonde$hisp, 
            lalonde$married, lalonde$nodegr, lalonde$u74, lalonde$u75, 
            lalonde$re75, lalonde$re74)

  BalanceMat <- cbind(lalonde$age, lalonde$educ, lalonde$black, 
                      lalonde$hisp, lalonde$married, lalonde$nodegr, 
                      lalonde$u74, lalonde$u75, lalonde$re75, lalonde$re74, 
                      I(lalonde$re74*lalonde$re75))

  genout <- GenMatch(Tr=lalonde$treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                     pop.size=16, max.generations=10, wait.generations=1)

  P1[i] <- genout$Weight.matrix[1,1]

}

The author of the paper also suggests the additional information may be of assistance, but it does not explain what the weight matrix values represent. CAn anyone interpret them or understand why their magnitude change when the order of the data is varied

like image 772
lukeg Avatar asked Jun 22 '15 12:06

lukeg


1 Answers

Unfortunately this is not a question that can be answered very easily (but to answer part of your question, no, the values of the weight matrix are not related to a standard deviation).

GenMatch is an affinely invariant matching algorithm that uses the distance measure d(), in which all elements of W are zero except down the main diagonal. The main diagonal consists of k parameters which must be chosen. (note that if each of these k parameters are set equal to 1, d() is the same as Mahalanobis distance). Like Mahalanobis distance, this distance metric can be used to conduct either greedy or optimal full matching. (The choice of setting the nondiagonal elements of W to zero is made for reasons of computational power alone)

The reason that the magnitudes change when the order of the data is varied is that the weight matrix W has an infinity of equivalent solutions. The matches produced are invariant to a constant scale change to the distance measure. In particular, the matches produced are the same for every W = cW for any positive scalar c, and thus the matrix can be uniquely identified in many ways.

In order to fully understand how the non-zero elements of the weight matrix are calculated, I recommend reading the full article behind the formulation of GenMatch which takes a somewhat deep and complex look at the methods used.

If you're just interested in the source code, you can view it on GitHub. If you have additional questions about the R specific code I'll be happy to try to answer them, however if you have further questions about the algorithms behind the generation of the weight matrix you'll likely need to head over to Cross Validated.

like image 91
scribbles Avatar answered Oct 20 '22 17:10

scribbles