Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do group matching in R?

Suppose I have the data.frame below where treat == 1 means that the id received treatment and prob is the calculated probability that treat == 1.

set.seed(1)
df <- data.frame(id = 1:10, treat = sample(0:1, 10, replace = T))
df$prob <- ifelse(df$treat, rnorm(10, .8, .1), rnorm(10, .4, .4))
df
   id treat      prob
1   1     0 0.3820266
2   2     0 0.3935239
3   3     1 0.8738325
4   4     1 0.8575781
5   5     0 0.6375605
6   6     1 0.9511781
7   7     1 0.8389843
8   8     1 0.7378759
9   9     1 0.5785300
10 10     0 0.6479303

To minimize selection bias, I now wish to create pseudo treatment and control groups on the basis of the values of treat and prob:

  • When any id withtreat == 1 is within 0.1 prob of any id with treat == 0, I want the value of group to be "treated".

  • When any id withtreat == 0 is within 0.1 prob of any id with treat == 1, I want the value of group to be "control".

Below is an example of what I'd like the result to be.

df$group <- c(NA, NA, NA, NA, 'control', NA, NA, 'treated', 'treated', 'control')
df
   id treat      prob   group
1   1     0 0.3820266    <NA>
2   2     0 0.3935239    <NA>
3   3     1 0.8738325    <NA>
4   4     1 0.8575781    <NA>
5   5     0 0.6375605 control
6   6     1 0.9511781    <NA>
7   7     1 0.8389843    <NA>
8   8     1 0.7378759 treated
9   9     1 0.5785300 treated
10 10     0 0.6479303 control

How would I go about doing this? In the example above, matching is done with replacements, but a solution without replacements would be welcome, too.

like image 585
lillemets Avatar asked Apr 20 '17 12:04

lillemets


4 Answers

You can try

foo <- function(x){
  TR <- range(x$prob[x$treat == 0])
  CT <- range(x$prob[x$treat == 1])
  tmp <- sapply(1:nrow(x), function(y, z){
    if(z$treat[y] == 1){
    ifelse(any(abs(z$prob[y] - TR) <= 0.1), "treated", "NA")
    }else{
    ifelse(any(abs(z$prob[y] - CT) <= 0.1), "control", "NA")
    }}, x)
  cbind(x, group = tmp)
  }

foo(df)    
   id treat      prob   group
1   1     0 0.3820266      NA
2   2     0 0.3935239      NA
3   3     1 0.8738325      NA
4   4     1 0.8575781      NA
5   5     0 0.6375605 control
6   6     1 0.9511781      NA
7   7     1 0.8389843      NA
8   8     1 0.7378759 treated
9   9     1 0.5785300 treated
10 10     0 0.6479303 control
like image 147
Roman Avatar answered Nov 12 '22 16:11

Roman


I think this problem is well suited for cut in base R. Here is how you can do it in a vectorized way:

f <- function(r) {
      x <- cut(df[r,]$prob, breaks = c(df[!r,]$prob-0.1, df[!r,]$prob+0.1))
      df[r,][!is.na(x),]$id
}

ones <- df$treat==1
df$group <- NA

df[df$id %in% f(ones),]$group <- "treated"
df[df$id %in% f(!ones),]$group <- "control"

> df

   # id treat      prob   group
# 1   1     0 0.3820266    <NA>
# 2   2     0 0.3935239    <NA>
# 3   3     1 0.8738325    <NA>
# 4   4     1 0.8575781    <NA>
# 5   5     0 0.6375605 control
# 6   6     1 0.9511781    <NA>
# 7   7     1 0.8389843    <NA>
# 8   8     1 0.7378759 treated
# 9   9     1 0.5785300 treated
# 10 10     0 0.6479303 control
like image 43
989 Avatar answered Nov 12 '22 15:11

989


Perhaps not the most elegant but it seems to work for me:

df %>% group_by(id,treat) %>% mutate(group2 = ifelse(treat==1,
                                                 ifelse(any(abs(prob-df[df$treat==0,3])<0.1),"treated","NA"),
                                                 ifelse(any(abs(prob-df[df$treat==1,3])<0.1),"control","NA"))) # treat==0
like image 31
timfaber Avatar answered Nov 12 '22 15:11

timfaber


Is this what you want?

#Base R:

apply(df[df$treat == 1, ],1, function(x){
  ifelse(any(df[df$treat == 0, 'prob'] -.1 < x[3] & x[3] < df[df$treat == 0, 'prob'] +.1), 'treated', NA)
})

You can invert $treatclause to reflect control-group and attach the variables to your df.

like image 35
Majo Avatar answered Nov 12 '22 14:11

Majo