Say I have a dataframe df with two or more columns, is there an easy way to use unique()
or other R function to create a subset of unique combinations of two or more columns?
I know I can use sqldf()
and write an easy "SELECT DISTINCT var1, var2, ... varN"
query, but I am looking for an R way of doing this.
It occurred to me to try ftable coerced to a dataframe and use the field names, but I also get the cross tabulations of combinations that don't exist in the dataset:
uniques <- as.data.frame(ftable(df$var1, df$var2))
Answer. Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note that DISTINCT will filter out all rows that are not unique in terms of all selected columns. Feel free to test this out in the editor to see what happens!
Use the unique() function to retrieve unique elements from a Vector, data frame, or array-like R object. The unique() function in R returns a vector, data frame, or array-like object with duplicate elements and rows deleted.
Both return the same output (albeit with a small difference - they indicate different row numbers). distinct returns an ordered list, whereas unique returns the row number of the first occurrence of each unique element. Overall, both functions return the unique row elements based on the combined set of columns chosen.
unique
works on data.frame
so unique(df[c("var1","var2")])
should be what you want.
Another option is distinct
from dplyr
package:
df %>% distinct(var1, var2) # or distinct(df, var1, var2)
Note:
For older versions of dplyr (< 0.5.0, 2016-06-24) distinct
required additional step
df %>% select(var1, var2) %>% distinct
(or oldish way distinct(select(df, var1, var2))
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With