Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R equivalent of SELECT DISTINCT on two or more fields/variables

Tags:

sql

dataframe

r

Say I have a dataframe df with two or more columns, is there an easy way to use unique() or other R function to create a subset of unique combinations of two or more columns?

I know I can use sqldf() and write an easy "SELECT DISTINCT var1, var2, ... varN" query, but I am looking for an R way of doing this.

It occurred to me to try ftable coerced to a dataframe and use the field names, but I also get the cross tabulations of combinations that don't exist in the dataset:

uniques <- as.data.frame(ftable(df$var1, df$var2)) 
like image 475
wahalulu Avatar asked May 24 '10 21:05

wahalulu


People also ask

Can you use SELECT distinct with multiple columns?

Answer. Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note that DISTINCT will filter out all rows that are not unique in terms of all selected columns. Feel free to test this out in the editor to see what happens!

How do I SELECT distinct records in R?

Use the unique() function to retrieve unique elements from a Vector, data frame, or array-like R object. The unique() function in R returns a vector, data frame, or array-like object with duplicate elements and rows deleted.

Is unique same as distinct R?

Both return the same output (albeit with a small difference - they indicate different row numbers). distinct returns an ordered list, whereas unique returns the row number of the first occurrence of each unique element. Overall, both functions return the unique row elements based on the combined set of columns chosen.


1 Answers

unique works on data.frame so unique(df[c("var1","var2")]) should be what you want.

Another option is distinct from dplyr package:

df %>% distinct(var1, var2) # or distinct(df, var1, var2) 

Note:

For older versions of dplyr (< 0.5.0, 2016-06-24) distinct required additional step

df %>% select(var1, var2) %>% distinct 

(or oldish way distinct(select(df, var1, var2))).

like image 50
Marek Avatar answered Oct 22 '22 05:10

Marek