How do you retain all distinct rows in a data frame excluding certain columns by specifying only the columns you want to exclude. In the example below
library(dplyr)
dat <- data_frame(
x = c("a", "a", "b"),
y = c("c", "c", "d"),
z = c("e", "f", "f")
)
I'd like to return a data frame with all distinct rows among variables x
and y
by only specifying that I'd like to exclude column z
. The data frame returned should look like the data frame returned from here
dat %>% distinct(x, y)
You would think you can do the following, but it results in an error
dat %>% distinct(-z)
I prefer a tidyverse solution
Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.
Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.
Just do:
library(dplyr)
dat %>%
distinct_at(vars(-z))
Output:
# A tibble: 2 x 2
x y
<chr> <chr>
1 a c
2 b d
And as of dplyr
1.0.0, you can use across
:
dat %>%
distinct(across(-z))
We could use
dat %>%
distinct(!!! rlang::syms(setdiff(names(.), "z")))
# A tibble: 2 x 2
# x y
# <chr> <chr>
#1 a c
#2 b d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With