Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unique rows, considering two columns, in R, without order

Tags:

r

unique

dplyr

plyr

Unlike questions I've found, I want to get the unique of two columns without order.

I have a df:

df<-cbind(c("a","b","c","b"),c("b","d","e","a"))
> df
     [,1] [,2]
 [1,] "a"  "b" 
 [2,] "b"  "d" 
 [3,] "c"  "e" 
 [4,] "b"  "a" 

In this case, row 1 and row 4 are "duplicates" in the sense that b-a is the same as b-a.

I know how to find unique of columns 1 and 2 but I would find each row unique under this approach.

like image 601
eflores89 Avatar asked Feb 18 '15 00:02

eflores89


2 Answers

If it's just two columns, you can also use pmin and pmax, like this:

library(data.table)
unique(as.data.table(df)[, c("V1", "V2") := list(pmin(V1, V2),
                         pmax(V1, V2))], by = c("V1", "V2"))
#    V1 V2
# 1:  a  b
# 2:  b  d
# 3:  c  e

A similar approach using "dplyr" might be:

library(dplyr)
data.frame(df, stringsAsFactors = FALSE) %>% 
  mutate(key = paste0(pmin(X1, X2), pmax(X1, X2), sep = "")) %>% 
  distinct(key)
#   X1 X2 key
# 1  a  b  ab
# 2  b  d  bd
# 3  c  e  ce
like image 127
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 25 '22 00:10

A5C1D2H2I1M1N2O1R2T1


There are lot's of ways to do this, here is one:

unique(t(apply(df, 1, sort)))
duplicated(t(apply(df, 1, sort)))

One gives the unique rows, the other gives the mask.

like image 34
jimmyb Avatar answered Oct 25 '22 02:10

jimmyb