Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you do conditional "left join" in R?

I have found myself doing a "conditional left join" several times in R. To illustrate with an example; if you have two data frames such as:

> df
    a b
  1 1 0
  2 2 0

> other.df
    a b
  1 2 3

The goal is to end up with this data frame:

> final.df
    a b
  1 1 0
  2 2 3

The code I've been written so far:

c <- merge(df, other.df, by=c("a"), all.x = TRUE)
c[is.na(c$b.y),]$b.y <- 0
d<-subset(c, select=c("a","b.y"))
colnames(d)[2]<-b

to finally arrive with the result I wanted.

Doing this in effectively four lines makes the code very opaque. Is there any better, less cumbersome way to do this?

like image 621
svenski Avatar asked Nov 14 '22 02:11

svenski


1 Answers

Here are two ways. In both cases the first line does a left merge returning the required columns. In the case of merge we then have to set the names. The final line in both lines replaces NAs with 0.

merge

res1 <- merge(df, other.df, by = "a", all.x = TRUE)[-2]
names(res1) <- names(df)
res1[is.na(res1)] <- 0

sqldf

library(sqldf)
res2 <- sqldf("select a, o.b from df left join 'other.df' o using(a)")
res2[is.na(res2)] <- 0
like image 121
G. Grothendieck Avatar answered Nov 17 '22 05:11

G. Grothendieck