Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify columns to exclude when retaining all distinct rows?

Tags:

r

dplyr

tidyverse

How do you retain all distinct rows in a data frame excluding certain columns by specifying only the columns you want to exclude. In the example below

library(dplyr)
dat <- data_frame(
    x = c("a", "a", "b"),
    y = c("c", "c", "d"),
    z = c("e", "f", "f")
)

I'd like to return a data frame with all distinct rows among variables x and y by only specifying that I'd like to exclude column z. The data frame returned should look like the data frame returned from here

dat %>% distinct(x, y)

You would think you can do the following, but it results in an error

dat %>% distinct(-z)

I prefer a tidyverse solution

like image 951
David Rubinger Avatar asked Feb 19 '19 17:02

David Rubinger


People also ask

Can you use distinct on only one column?

Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.

Does SELECT distinct apply to all columns?

Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.


2 Answers

Just do:

library(dplyr)

dat %>%
  distinct_at(vars(-z))

Output:

# A tibble: 2 x 2
  x     y    
  <chr> <chr>
1 a     c    
2 b     d    

And as of dplyr 1.0.0, you can use across:

dat %>% 
  distinct(across(-z))
like image 73
arg0naut91 Avatar answered Jan 12 '23 00:01

arg0naut91


We could use

dat %>% 
    distinct(!!! rlang::syms(setdiff(names(.), "z")))
# A tibble: 2 x 2
#  x     y    
#  <chr> <chr>
#1 a     c    
#2 b     d    
like image 24
akrun Avatar answered Jan 11 '23 23:01

akrun