How can I join 2 tables with an OR statement in R using dplyrs join functions?
E.g. Join df1 on df2 with the key: df1_ColumnA == df2_ColumnA OR df1_ColumnA == df2_ColumnB?
library(dplyr)
df1 = data.frame(V1 = c('A', 'B', 'C','D'), v2 = c(1,2,3,4))
df2 = data.frame(VA = c('A', 'B', 'C'), VB = c('D', 'E', 'F'), vC = c(4,5,6))
## Join on column 1
df3 = df1 %>% left_join(df2, by = c('V1' = 'VA'))
## join on column 1 AND column 2
df4 = df1 %>% left_join(df2, by = c('V1' = 'VA', 'V1' = 'VB'))
This is not working:
## join on column 1 OR column 2
df4 = df1 %>% left_join(df2, by = c('V1' = 'VA' | 'V1' = 'VB'))
Edit: expected output
V1 v2 VA VB vC
1 A 1 A D 4
2 B 2 B E 5
3 C 3 C F 6
4 D 4 A D 4
The OR
join is not implemented in tidyverse
(could be wrong too). But, there is one option to unite
the columns of interest together andd then do a regex_right_join/regex_left_join
from fuzzyjoin
library(fuzzyjoin)
library(dplyr)
df2 %>%
unite(VAB, VA, VB, remove = FALSE) %>%
regex_right_join(df1, by = c("VAB" = "V1")) %>%
select(names(df1), names(df2))
#. V1 v2 VA VB vC
#1 A 1 A D 4
#2 B 2 B E 5
#3 C 3 C F 6
#4 D 4 A D 4
We can use sqldf
sqldf::sqldf('select a.*,b.* from df1 a left join df2 b on a.V1=b.VA or a.V1=b.VB')
V1 v2 VA VB vC
1 A 1 A D 4
2 B 2 B E 5
3 C 3 C F 6
4 D 4 A D 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With