Stata automatically creates a variable called "_merge" indicating the matched variables in both datasets after merge. Is there a way to get such variable generated by R's merge() function?
The possible values of _merge
in Stata
are (note merge
can also have values 4 and 5)
1 master observation appeared in master only
2 using observation appeared in using only
3 match observation appeared in both
In R
, you can do that by entering the argument as either all=TRUE
or all.x=TRUE
or all.y=TRUE
e.g.,
merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all = TRUE)
merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all.x = TRUE)
merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all.y = TRUE)
I have written the following function based on @Metrics answer. It creates a variable "merge" in the resulting dataset that indicates observations as Stata does.
stata.merge <- function(x,y, by = intersect(names(x), names(y))){
x[is.na(x)] <- Inf
y[is.na(y)] <- Inf
matched <- merge(x, y, by.x = by, by.y = by, all = TRUE)
matched <- matched[complete.cases(matched),]
matched$merge <- "matched"
master <- merge(x, y, by.x = by, by.y = by, all.x = TRUE)
master <- master[!complete.cases(master),]
master$merge <- "master"
using <- merge(x, y, by.x = by, by.y = by, all.y = TRUE)
using <- using[!complete.cases(using),]
using$merge <- "using"
df <- rbind(matched, master,using)
df[sapply(df, is.infinite)] <- NA
df
}
Test.
df1 <- data.frame(id = letters[c(1:5,8:9)], v1=c(1:5,8:9))
df1
id v1
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 h 8
7 i 9
df2 <- data.frame(id = letters[1:8], v1=c(1:7,NA))
df2
id v1
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
7 g 7
8 h NA
stata.merge(df1,df2, by = "id")
id v1.x v1.y merge
1 a 1 1 matched
2 b 2 2 matched
3 c 3 3 matched
4 d 4 4 matched
5 e 5 5 matched
6 h 8 NA matched
7 i 9 NA master
71 f NA 6 using
8 g NA 7 using
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With