I have found myself doing a "conditional left join" several times in R. To illustrate with an example; if you have two data frames such as:
> df
a b
1 1 0
2 2 0
> other.df
a b
1 2 3
The goal is to end up with this data frame:
> final.df
a b
1 1 0
2 2 3
The code I've been written so far:
c <- merge(df, other.df, by=c("a"), all.x = TRUE)
c[is.na(c$b.y),]$b.y <- 0
d<-subset(c, select=c("a","b.y"))
colnames(d)[2]<-b
to finally arrive with the result I wanted.
Doing this in effectively four lines makes the code very opaque. Is there any better, less cumbersome way to do this?
Here are two ways. In both cases the first line does a left merge returning the required columns. In the case of merge
we then have to set the names. The final line in both lines replaces NA
s with 0
.
merge
res1 <- merge(df, other.df, by = "a", all.x = TRUE)[-2]
names(res1) <- names(df)
res1[is.na(res1)] <- 0
sqldf
library(sqldf)
res2 <- sqldf("select a, o.b from df left join 'other.df' o using(a)")
res2[is.na(res2)] <- 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With