I am running coarsened exact matching (CEM) via the package MatchIt as a pre-processing step and want to use the matched data in further analyses. As a test, I ran CEM using the package cem, and noticed that the imbalance measure differed from the one via the MatchIt package. For example, using the LaLonde dataset:
library(MatchIt)
library(cem)
data(LL)
re74cut <- seq(0, 40000, 5000)
re75cut <- seq(0, max(LL$re75)+1000, by=1000)
agecut <- c(20.5, 25.5, 30.5,35.5,40.5)
my.cutpoints <- list(re75=re75cut, re74=re74cut, age=agecut)
matchit.match <- matchit(treated ~ age + education + black + married + nodegree +
re74 + re75 + hispanic + u74 + u75,
data = LL,
method = "cem",
cutpoints = my.cutpoints)
matchit.data <- match.data(matchit.match)
matchit.imb <- imbalance(group=matchit.data$treated,
data=matchit.data,
drop=c("treated","re78","distance",
"weights","subclass"))
cem.match <- cem(treatment = "treated",
data = LL, drop = "re78",
cutpoints = my.cutpoints,
eval.imbalance = TRUE)
matchit.imb
cem.match$imbalance
Does anybody know what is going on here? Thank you for any help.
There are two reasons. First, you must supply the weights from the matchit
object to imbalance()
. If you include these, the (diff)
statistics will be correct, but the L1 statistic will still be wrong.
Second, by using matchit.data
instead of LL
in the call to imbalance()
, the breaks for the L1 statistics are applied using only the matched data instead of the full dataset, which yields a different calculation of the L1 statistic. To correct this, in the call to imbalance()
, you should supply the original, not matched, dataset, and using the matching weights to provide information on the matches. So, your final call to imbalance()
should look like the following:
imbalance(LL$treated,
data=LL,
drop=c("treated", "re78"),
weights=matchit.match$weights)
That will produce the same results as cem.match$imbalance
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With