Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join by everything except specified columns in dplyr?

Tags:

r

dplyr

I have two datasets that share all columns and I would like to make an anti-join based on all of the columns except two of them.

For example, I would like to do something like the following:

library(dplyr)
df1 <- tibble(x = c("A", "B", "C"), y = c("X", "Y", "Z"), z = c(1, 2, 3),
              a = c(4, 5, 6))

df2 <- tibble(x = c("A", "D", "E"), y = c("X", "W", "R"), z = c(1, 5, 6),
              a = c(4, 7, 8))

df2 %>% anti_join(df1, join_by(-c(z, a)))
#> Error in `join_by()`:
#> ! Expressions must use one of: `==`, `>=`, `>`, `<=`, `<`, `closest()`,
#>   `between()`, `overlaps()`, or `within()`.
#> ℹ Expression 1 is `-c(z, a)`.

#> Backtrace:
#>      ▆
#>   1. ├─df2 %>% anti_join(df1, join_by(-c(z, a)))
#>   2. ├─dplyr::anti_join(., df1, join_by(-c(z, a)))
#>   3. ├─dplyr:::anti_join.data.frame(., df1, join_by(-c(z, a)))
#>   4. │ └─dplyr:::join_filter(...)
#>   5. │   └─dplyr:::is_cross_by(by)
#>   6. │     └─rlang::is_character(x, n = 0L)
#>   7. └─dplyr::join_by(-c(z, a))
#>   8.   └─dplyr:::parse_join_by_expr(exprs[[i]], i, error_call = error_call)
#>   9.     └─dplyr:::stop_invalid_top_expression(expr, i, error_call)
#>  10.       └─rlang::abort(message, call = call)

Created on 2023-03-27 with reprex v2.0.2

So, is there any option to tidy-select variables in joins? Or, particularly, to call a join by everything except some variables.

like image 619
Alberto Agudo Dominguez Avatar asked Sep 02 '25 14:09

Alberto Agudo Dominguez


1 Answers

select() the unwanted columns out of df2 rather than trying to specify in join_by():

library(dplyr)

df2 %>%
  anti_join(select(df1, -c(z, a)))

# Joining with `by = join_by(x, y)`
# # A tibble: 2 × 4
#   x     y         z     a
#   <chr> <chr> <dbl> <dbl>
# 1 D     W         5     7
# 2 E     R         6     8

For a standard join, do the same thing if you want to discard df2$z and $a. Otherwise, append a suffix using rename_with():

df2 %>%
  full_join(
    rename_with(df1, \(x) paste0(x, ".df1"), c(z, a))
  )

# Joining with `by = join_by(x, y)`
# # A tibble: 5 × 6
#   x     y         z     a z.df1 a.df1
#   <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 A     X         1     4     1     4
# 2 D     W         5     7    NA    NA
# 3 E     R         6     8    NA    NA
# 4 B     Y        NA    NA     2     5
# 5 C     Z        NA    NA     3     6
like image 143
zephryl Avatar answered Sep 05 '25 10:09

zephryl