Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr join on column A OR column B

Tags:

r

dplyr

How can I join 2 tables with an OR statement in R using dplyrs join functions?

E.g. Join df1 on df2 with the key: df1_ColumnA == df2_ColumnA OR df1_ColumnA == df2_ColumnB?

library(dplyr)

df1 = data.frame(V1 = c('A', 'B', 'C','D'), v2 = c(1,2,3,4))
df2 = data.frame(VA = c('A', 'B', 'C'), VB = c('D', 'E', 'F'), vC = c(4,5,6))

## Join on column 1
df3 = df1 %>% left_join(df2, by = c('V1' = 'VA'))

## join on column 1 AND column 2
df4 = df1 %>% left_join(df2, by = c('V1' = 'VA', 'V1' = 'VB'))

This is not working:

## join on column 1 OR column 2
df4 = df1 %>% left_join(df2, by = c('V1' = 'VA' | 'V1' = 'VB'))

Edit: expected output

   V1 v2 VA VB vC
1  A  1  A  D  4
2  B  2  B  E  5
3  C  3  C  F  6
4  D  4  A  D  4
like image 461
Koot6133 Avatar asked Dec 31 '22 17:12

Koot6133


2 Answers

The OR join is not implemented in tidyverse (could be wrong too). But, there is one option to unite the columns of interest together andd then do a regex_right_join/regex_left_join from fuzzyjoin

library(fuzzyjoin)
library(dplyr)
df2 %>%
    unite(VAB, VA, VB, remove = FALSE) %>%
    regex_right_join(df1, by = c("VAB" = "V1")) %>% 
    select(names(df1), names(df2))
#.  V1 v2 VA VB vC
#1  A  1  A  D  4
#2  B  2  B  E  5
#3  C  3  C  F  6
#4  D  4  A  D  4
like image 147
akrun Avatar answered Jan 08 '23 00:01

akrun


We can use sqldf

sqldf::sqldf('select a.*,b.* from df1 a left join df2 b on a.V1=b.VA or a.V1=b.VB')

   V1 v2 VA VB vC
1  A  1  A  D  4
2  B  2  B  E  5
3  C  3  C  F  6
4  D  4  A  D  4
like image 26
A. Suliman Avatar answered Jan 08 '23 02:01

A. Suliman