Suppose I have the data.frame below where treat == 1
means that the id
received treatment and prob
is the calculated probability that treat == 1
.
set.seed(1)
df <- data.frame(id = 1:10, treat = sample(0:1, 10, replace = T))
df$prob <- ifelse(df$treat, rnorm(10, .8, .1), rnorm(10, .4, .4))
df
id treat prob
1 1 0 0.3820266
2 2 0 0.3935239
3 3 1 0.8738325
4 4 1 0.8575781
5 5 0 0.6375605
6 6 1 0.9511781
7 7 1 0.8389843
8 8 1 0.7378759
9 9 1 0.5785300
10 10 0 0.6479303
To minimize selection bias, I now wish to create pseudo treatment and control groups on the basis of the values of treat
and prob
:
When any id withtreat == 1
is within 0.1 prob
of any id
with treat == 0
, I want the value of group
to be "treated".
When any id withtreat == 0
is within 0.1 prob
of any id
with treat == 1
, I want the value of group
to be "control".
Below is an example of what I'd like the result to be.
df$group <- c(NA, NA, NA, NA, 'control', NA, NA, 'treated', 'treated', 'control')
df
id treat prob group
1 1 0 0.3820266 <NA>
2 2 0 0.3935239 <NA>
3 3 1 0.8738325 <NA>
4 4 1 0.8575781 <NA>
5 5 0 0.6375605 control
6 6 1 0.9511781 <NA>
7 7 1 0.8389843 <NA>
8 8 1 0.7378759 treated
9 9 1 0.5785300 treated
10 10 0 0.6479303 control
How would I go about doing this? In the example above, matching is done with replacements, but a solution without replacements would be welcome, too.
You can try
foo <- function(x){
TR <- range(x$prob[x$treat == 0])
CT <- range(x$prob[x$treat == 1])
tmp <- sapply(1:nrow(x), function(y, z){
if(z$treat[y] == 1){
ifelse(any(abs(z$prob[y] - TR) <= 0.1), "treated", "NA")
}else{
ifelse(any(abs(z$prob[y] - CT) <= 0.1), "control", "NA")
}}, x)
cbind(x, group = tmp)
}
foo(df)
id treat prob group
1 1 0 0.3820266 NA
2 2 0 0.3935239 NA
3 3 1 0.8738325 NA
4 4 1 0.8575781 NA
5 5 0 0.6375605 control
6 6 1 0.9511781 NA
7 7 1 0.8389843 NA
8 8 1 0.7378759 treated
9 9 1 0.5785300 treated
10 10 0 0.6479303 control
I think this problem is well suited for cut
in base R. Here is how you can do it in a vectorized way:
f <- function(r) {
x <- cut(df[r,]$prob, breaks = c(df[!r,]$prob-0.1, df[!r,]$prob+0.1))
df[r,][!is.na(x),]$id
}
ones <- df$treat==1
df$group <- NA
df[df$id %in% f(ones),]$group <- "treated"
df[df$id %in% f(!ones),]$group <- "control"
> df
# id treat prob group
# 1 1 0 0.3820266 <NA>
# 2 2 0 0.3935239 <NA>
# 3 3 1 0.8738325 <NA>
# 4 4 1 0.8575781 <NA>
# 5 5 0 0.6375605 control
# 6 6 1 0.9511781 <NA>
# 7 7 1 0.8389843 <NA>
# 8 8 1 0.7378759 treated
# 9 9 1 0.5785300 treated
# 10 10 0 0.6479303 control
Perhaps not the most elegant but it seems to work for me:
df %>% group_by(id,treat) %>% mutate(group2 = ifelse(treat==1,
ifelse(any(abs(prob-df[df$treat==0,3])<0.1),"treated","NA"),
ifelse(any(abs(prob-df[df$treat==1,3])<0.1),"control","NA"))) # treat==0
Is this what you want?
#Base R:
apply(df[df$treat == 1, ],1, function(x){
ifelse(any(df[df$treat == 0, 'prob'] -.1 < x[3] & x[3] < df[df$treat == 0, 'prob'] +.1), 'treated', NA)
})
You can invert $treat
clause to reflect control-group and attach the variables to your df.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With